Using a unique data collection, we are able to study the detection of dense geometric objects in image data
where object density, clarity, and size vary. The data is a large set of black and white images of
scatterplots, taken from journals reporting thermophysical property data of metal systems, whose plot
points are represented primarily by circles, triangles, and squares. We built a highly accurate single class
U-Net convolutional neural network model to identify 97 % of image objects in a defined set of test images,
locating the centers of the objects to within a few pixels of the correct locations. We found an optimal way
in which to mark our training data masks to achieve this level of accuracy. The optimal markings for object
classification, however, required more information in the masks to identify particular types of geometries.
We show a range of different patterns used to mark the training data masks, and how they help or hurt our
dual goals of location and classification. Altering the annotations in the segmentation masks can increase
both the accuracy of object classification and localization on the plots, more than other factors such as
adding loss terms to the network calculations. However, localization of the plot points and classification of
the geometric objects require different optimal training data.
A new block cipher for image encryption based on multi chaotic systemsTELKOMNIKA JOURNAL
In this paper, a new algorithm for image encryption is proposed based on three chaotic systems which are Chen system,logistic map and two-dimensional (2D) Arnold cat map. First, a permutation scheme is applied to the image, and then shuffled image is partitioned into blocks of pixels. For each block, Chen system is employed for confusion and then logistic map is employed for generating subsititution-box (S-box) to substitute image blocks. The S-box is dynamic, where it is shuffled for each image block using permutation operation. Then, 2D Arnold cat map is used for providing diffusion, after that XORing the result using Chen system to obtain the encrypted image.The high security of proposed algorithm is experimented using histograms, unified average changing intensity (UACI), number of pixels change rate (NPCR), entropy, correlation and keyspace analyses.
The document discusses the applicability of fuzzy theory in remote sensing image classification. It presents three experiments comparing different classification methods: 1) Unsupervised fuzzy c-means classification, 2) Supervised classification using fuzzy signatures, 3) Supervised classification using fuzzy signatures and membership functions. The supervised fuzzy methods achieved higher accuracy than the unsupervised method, with the third method performing best with an overall accuracy of 83.9%. Fuzzy convolution can further optimize results by combining classification bands.
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...IJERD Editor
This paper presents a blind steganalysis technique to effectively attack the JPEG steganographic
schemes i.e. Jsteg, F5, Outguess and DWT Based. The proposed method exploits the correlations between
block-DCTcoefficients from intra-block and inter-block relation and the statistical moments of characteristic
functions of the test image is selected as features. The features are extracted from the BDCT JPEG 2-array.
Support Vector Machine with cross-validation is implemented for the classification.The proposed scheme gives
improved outcome in attacking.
Image Segmentation from RGBD Images by 3D Point Cloud Attributes and High-Lev...CSCJournals
The document describes an image segmentation algorithm that uses both color and depth features extracted from RGBD images captured by a Kinect sensor. The algorithm clusters pixels into segments based on their color, texture, 3D spatial coordinates, surface normals, and the output of a graph-based segmentation algorithm. Depth features help resolve illumination issues and occlusion that cannot be handled by color-only methods. The algorithm was tested on commercial building images and showed potential for real-time applications.
This document summarizes a research paper that proposes a technique for clustering objects in movies using graph mining. It involves segmenting a movie into frames, extracting features of objects in each frame, constructing a graph representing relationships between objects, and applying a graph mining algorithm to cluster objects and determine their behaviors across frames. The algorithm represents each frame as a graph and mines patterns to discover how object clusters change over time. The approach aims to efficiently analyze movie content by modeling the spatial and relational properties between objects in each frame.
Object Recogniton Based on Undecimated Wavelet TransformIJCOAiir
Object Recognition (OR) is the mission of finding a specified object in an image or video sequence
in computer vision. An efficient method for recognizing object in an image based on Undecimated Wavelet
Transform (UWT) is proposed. In this system, the undecimated coefficients are used as features to recognize the
objects. The given original image is decomposed by using the UWT. All coefficients are taken as features for
the classification process. This method is applied to all the training images and the extracted features of
unknown object are used as an input to the K-Nearest Neighbor (K-NN) classifier to recognize the object. The
assessment of the system is agreed on using Columbia Object Image Library Dataset (COIL-100) database.
An effective RGB color selection for complex 3D object structure in scene gra...IJECEIAES
Our goal of the project is to develop a complete, fully detailed 3D interactive model of the human body and systems in the human body, and allow the user to interacts in 3D with all the elements of that system, to teach students about human anatomy. Some organs, which contain a lot of details about a particular anatomy, need to be accurately and fully described in minute detail, such as the brain, lungs, liver and heart. These organs are need have all the detailed descriptions of the medical information needed to learn how to do surgery on them, and should allow the user to add careful and precise marking to indicate the operative landmarks on the surgery location. Adding so many different items of information is challenging when the area to which the information needs to be attached is very detailed and overlaps with all kinds of other medical information related to the area. Existing methods to tag areas was not allowing us sufficient locations to attach the information to. Our solution combines a variety of tagging methods, which use the marking method by selecting the RGB color area that is drawn in the texture, on the complex 3D object structure. Then, it relies on those RGB color codes to tag IDs and create relational tables that store the related information about the specific areas of the anatomy. With this method of marking, it is possible to use the entire set of color values (R, G, B) to identify a set of anatomic regions, and this also makes it possible to define multiple overlapping regions.
A new block cipher for image encryption based on multi chaotic systemsTELKOMNIKA JOURNAL
In this paper, a new algorithm for image encryption is proposed based on three chaotic systems which are Chen system,logistic map and two-dimensional (2D) Arnold cat map. First, a permutation scheme is applied to the image, and then shuffled image is partitioned into blocks of pixels. For each block, Chen system is employed for confusion and then logistic map is employed for generating subsititution-box (S-box) to substitute image blocks. The S-box is dynamic, where it is shuffled for each image block using permutation operation. Then, 2D Arnold cat map is used for providing diffusion, after that XORing the result using Chen system to obtain the encrypted image.The high security of proposed algorithm is experimented using histograms, unified average changing intensity (UACI), number of pixels change rate (NPCR), entropy, correlation and keyspace analyses.
The document discusses the applicability of fuzzy theory in remote sensing image classification. It presents three experiments comparing different classification methods: 1) Unsupervised fuzzy c-means classification, 2) Supervised classification using fuzzy signatures, 3) Supervised classification using fuzzy signatures and membership functions. The supervised fuzzy methods achieved higher accuracy than the unsupervised method, with the third method performing best with an overall accuracy of 83.9%. Fuzzy convolution can further optimize results by combining classification bands.
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...IJERD Editor
This paper presents a blind steganalysis technique to effectively attack the JPEG steganographic
schemes i.e. Jsteg, F5, Outguess and DWT Based. The proposed method exploits the correlations between
block-DCTcoefficients from intra-block and inter-block relation and the statistical moments of characteristic
functions of the test image is selected as features. The features are extracted from the BDCT JPEG 2-array.
Support Vector Machine with cross-validation is implemented for the classification.The proposed scheme gives
improved outcome in attacking.
Image Segmentation from RGBD Images by 3D Point Cloud Attributes and High-Lev...CSCJournals
The document describes an image segmentation algorithm that uses both color and depth features extracted from RGBD images captured by a Kinect sensor. The algorithm clusters pixels into segments based on their color, texture, 3D spatial coordinates, surface normals, and the output of a graph-based segmentation algorithm. Depth features help resolve illumination issues and occlusion that cannot be handled by color-only methods. The algorithm was tested on commercial building images and showed potential for real-time applications.
This document summarizes a research paper that proposes a technique for clustering objects in movies using graph mining. It involves segmenting a movie into frames, extracting features of objects in each frame, constructing a graph representing relationships between objects, and applying a graph mining algorithm to cluster objects and determine their behaviors across frames. The algorithm represents each frame as a graph and mines patterns to discover how object clusters change over time. The approach aims to efficiently analyze movie content by modeling the spatial and relational properties between objects in each frame.
Object Recogniton Based on Undecimated Wavelet TransformIJCOAiir
Object Recognition (OR) is the mission of finding a specified object in an image or video sequence
in computer vision. An efficient method for recognizing object in an image based on Undecimated Wavelet
Transform (UWT) is proposed. In this system, the undecimated coefficients are used as features to recognize the
objects. The given original image is decomposed by using the UWT. All coefficients are taken as features for
the classification process. This method is applied to all the training images and the extracted features of
unknown object are used as an input to the K-Nearest Neighbor (K-NN) classifier to recognize the object. The
assessment of the system is agreed on using Columbia Object Image Library Dataset (COIL-100) database.
An effective RGB color selection for complex 3D object structure in scene gra...IJECEIAES
Our goal of the project is to develop a complete, fully detailed 3D interactive model of the human body and systems in the human body, and allow the user to interacts in 3D with all the elements of that system, to teach students about human anatomy. Some organs, which contain a lot of details about a particular anatomy, need to be accurately and fully described in minute detail, such as the brain, lungs, liver and heart. These organs are need have all the detailed descriptions of the medical information needed to learn how to do surgery on them, and should allow the user to add careful and precise marking to indicate the operative landmarks on the surgery location. Adding so many different items of information is challenging when the area to which the information needs to be attached is very detailed and overlaps with all kinds of other medical information related to the area. Existing methods to tag areas was not allowing us sufficient locations to attach the information to. Our solution combines a variety of tagging methods, which use the marking method by selecting the RGB color area that is drawn in the texture, on the complex 3D object structure. Then, it relies on those RGB color codes to tag IDs and create relational tables that store the related information about the specific areas of the anatomy. With this method of marking, it is possible to use the entire set of color values (R, G, B) to identify a set of anatomic regions, and this also makes it possible to define multiple overlapping regions.
This document summarizes a research paper on background subtraction techniques for motion detection in video. It describes a proposed technique that stores and compares past pixel values to the current value to determine if a pixel belongs to the background or foreground. It also discusses using a k-means algorithm and Gaussian mixture model to build a probabilistic background model and classify pixels. The paper evaluates different shadow detection approaches and finds RGB color spaces perform best for segmentation and shadow removal.
This document describes MMRetrieval.net, a multimodal search engine that allows retrieval of information from multiple modalities including text and images. It discusses challenges in multimodal retrieval like defining modality, fusing scores across modalities, and improving efficiency through two-stage retrieval. MMRetrieval.net was developed by researchers at DUTH to participate in the ImageCLEF 2010 Wikipedia retrieval task, where it achieved the second best performance overall by combining text and visual features through various fusion techniques. The system demonstrates the promise of multimodal search but also identifies open challenges around modality definitions, fusion methods, and scaling to large databases.
A CONCERT EVALUATION OF EXEMPLAR BASED IMAGE INPAINTING ALGORITHMS FOR NATURA...cscpconf
Image inpainting derives from restoration of art works, and has been applied to repair ancient
art works. Inpainting is a technique of restoring a partially damaged or occluded image in an
undetectable way. It fills the damaged part of an image by employing information of the
undamaged part according to some rules to make it look “reasonable” to human eyes. Digital
image inpainting is relatively new area of research, but numerous and different approaches to
tackle the inpainting problem have been proposed since the concept was first introduced. This
paper analyzes and compares the recent exemplar based inpainting algorithms by Minqin Wang
and Hao Guo et al. A number of examples on real images are demonstrated to evaluate the
results of algorithms using Peak Signal to Noise Ratio (PSNR)
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
Outlier detection is very interesting, useful and challenging problem in the field of data mining. Because of
sparse data clustering algorithm which are based on distance will not work to find outliers in spatial data.
Problem of finding irregular feature in spatial data need to be explore. Many existing approaches have
been proposed to overcome the problem of outlier detection in spatial Geographic data. In this paper an
efficient clustering and density based outlier detection framework has been proposed. The process of
outlier detection has been categorized into two steps in the first step data has been clustered together based
on any density based DBSCAN algorithm and in the second stage outlier detection is performed using LOF.
The purpose is to perform clustering and outlier mining simultaneously to improve feasibility of framework.
To verify the efficiency and robustness of proposed method, comparative study of proposed approach and
several existing approaches are presented in detail, various simulation results demonstrate the
effectiveness of the proposed approach.
Segmentation - based Historical Handwritten Word Spotting using document-spec...Konstantinos Zagoris
Many word spotting strategies for the modern documents are not directly applicable to historical handwritten documents due to writing styles variety and intense degradation. In this paper, a new method that permits effective word spotting in handwritten documents is presented that relies upon document-specific local features which take into account texture information around representative keypoints. Experimental work on two historical handwritten datasets using standard evaluation measures shows the improved performance achieved by the proposed methodology.
MINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATUREijcsit
There are two defects in WOW. One is image feature is not considered when hiding information through minimal distortion path and it leads to high total distortion. Another is total distortion grows too rapidly with hidden capacity increasing and it leads to poor anti-detection when hidden capacity is large. To solve these two problems, a new algorithm named MDIS was proposed. MDIS is also based on the minimizing additive distortion framework of STC and has the same distortion function with WOW. The feature that there are a large number of pixels, having the same value with one of their eight neighbour pixels and the mechanism of secret sharing are used in MDIS, which can reduce the total distortion, improve the antidetection and increase the value of PNSR. Experimental results showed that MDIS has better invisibility, smaller distortion and stronger anti-detection than WOW.
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)Konstantinos Zagoris
H-KWS 2014 is the Handwritten Keyword Spotting Competition organized in conjunction with ICFHR 2014 conference. The main objective of the competition is to record current advances in keyword spotting algorithms using established performance evaluation measures frequently encountered in the information retrieval literature. The competition comprises two distinct tracks, namely, a segmentation-based and a segmentation- free track. Five (5) distinct research groups have participated in the competition with three (3) methods for the segmentation- based track and four (4) methods for the segmentation-free track. The benchmarking datasets that were used in the contest contain both historical and modern documents from multiple writers. In this paper, the contest details are reported including the evaluation measures and the performance of the submitted methods along with a short description of each method.
The efficiency and quality of a feature descriptor are critical to the user experience of many computer vision applications. However, the existing descriptors are either too computationally expensive to achieve real-time performance, or not sufficiently distinctive to identify correct matches from a large database with various transformations. In this paper, we propose a highly efficient and distinctive binary descriptor, called local difference binary (LDB). LDB directly computes a binary string for an image patch using simple intensity and gradient difference tests on pair wise grid cells within the patch. A multiple-gridding strategy and a salient bit-selection method are applied to capture the distinct patterns of the patch at different spatial granularities. Experimental results demonstrate that compared to the existing state-of-the-art binary descriptors, primarily designed for speed, LDB has similar construction efficiency, while achieving a greater accuracy and faster speed for mobile object recognition and tracking tasks.
Computer Vision: Visual Extent of an ObjectIOSR Journals
The document discusses analyzing the visual extent of objects in images using computer vision techniques. It performs two analyses: 1) Without knowing object locations, it determines which image parts contribute most to object classification. 2) Assuming known object locations, it evaluates the potential of object vs. surround and object interior vs. border. The analyses find that object classification performance improves significantly when the object location is known. Descriptors from the object interior contribute more than those from the border. Knowing object locations allows separating relevant from irrelevant image regions.
Textual information in images constitutes a very rich source of high-level semantics for retrieval and indexing. In this paper, a new approach is proposed using Cellular Automata (CA) which strives towards identifying scene text on natural images. Initially, a binary edge map is calculated. Then, taking advantage of the CA flexibility, the transition rules are changing and are applied in four consecutive steps resulting in four time steps CA evolution. Finally, a post-processing technique based on edge projection analysis is employed for high density edge images concerning the elimination of possible false positives. Evaluation results indicate considerable performance gains without sacrificing text detection accuracy.
Text extraction using document structure features and support vector machinesKonstantinos Zagoris
This document presents a bottom-up text localization technique that uses document structure features and support vector machines. The technique detects and extracts text from document images through several steps: preprocessing, locating and merging blocks, extracting features from blocks, and using support vector machines trained on the features to classify blocks as containing text or not. The technique uses a flexible feature descriptor based on structural elements that can adapt to different document image types. Experimental results on document images show a 98.5% success rate in classifying blocks.
This document presents a novel method for recognizing two-dimensional QR barcodes using texture feature analysis and neural networks. It first extracts texture features like mean, standard deviation, smoothness, skewness and entropy from divided blocks of barcode images. These features are then used to train a neural network to classify blocks as containing a barcode or not. The trained neural network can then be used to locate barcodes in unknown images by classifying each block. The method is implemented and evaluated using MATLAB on a database of QR code images, showing satisfactory recognition results.
This document summarizes an article that proposes a new image steganography technique using discrete wavelet transform. The technique applies an adaptive pixel pair matching method from the spatial domain to the frequency domain. Data is embedded in the middle frequencies of the discrete wavelet transformed image because they are more robust to attacks than high frequencies. The coefficients in the low frequency sub-band are preserved unchanged to improve image quality. The experimental results showed better performance with discrete wavelet transform compared to the spatial domain.
A system was developed able to retrieve specific documents from a document collection. In this system the query is given in text by the user and then transformed into image. Appropriate features were in order to capture the general shape of the query, and ignore details due to noise or different fonts. In order to demonstrate the effectiveness of our system, we used a collection of noisy documents and we compared our results with those of a commercial OCR package.
International Journal of Research in Engineering and Science is an open access peer-reviewed international forum for scientists involved in research to publish quality and refereed papers. Papers reporting original research or experimentally proved review work are welcome. Papers for publication are selected through peer review to ensure originality, relevance, and readability.
This document discusses automatic image annotation using weakly supervised graph propagation. It begins with an abstract describing the method, which assigns annotated labels to semantically derived image regions using training images, pre-assigned labels, and input images. Graph construction considers consistency and incongruity relationships between image patches. Label propagation aims to propagate labels from images to patches using these relationships. Evaluation shows the proposed method achieves higher accuracy than baselines on several datasets by leveraging contextual information between image regions.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Binarization of Degraded Text documents and Palm Leaf ManuscriptsIRJET Journal
This document proposes a technique for binarizing degraded text documents and palm leaf manuscripts. It involves taking the average pixel value of the image as a threshold to distinguish foreground from background. The algorithm first computes the average value of the original image and uses it to set pixels above the threshold to black, removing background. It then computes the average of the remaining image, excluding black pixels, and uses that value as a new threshold to set remaining pixels above it to white, extracting the foreground. The technique is tested on old documents and manuscripts, showing improvement over existing methods based on metrics like peak signal-to-noise ratio. While effective for documents, it needs improvement for palm leaf manuscripts with non-uniform degradation.
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...IAEME Publication
Multiple human tracking based on object detection has been a challenge due to its
complexity. Errors in object detection would be propagated to tracking errors. In this
paper, we propose a tracking method that minimizes the error produced by object
detector. We use RetinaNet as object detector and Hungarian algorithm for tracking.
The cost matrix for Hungarian algorithm is calculated using the RetinaNet features,
bounding box center distances, and intersection of unions of bounding boxes. We
interpolate the missing detections in the last step. The proposed method yield 43.2
MOTA for MOT16 benchmark
AN ENHANCED SEPARABLE REVERSIBLE DATA HIDING IN ENCRYPTED IMAGES USING SIDE M...Editor IJMTER
This paper proposes a scheme for Enhanced Separable Reversible Data Hiding in
Encrypted images Using Side Match. In the first step the original image is encrypted using an
encryption key. Then additional data is embedded into the image by modifying a small portion of the
encrypted image using a data hiding key. With an encrypted image containing additional data, if a
receiver has the data hiding key, he can extract the additional data. If the receiver has the encryption
key, he can decrypt the image, but cannot extract the additional data. If the receiver has both the data
hiding key and encryption key, he can extract the additional data and recover the original content by
exploiting the spatial correlation in natural images. The accuracy of data extraction is improved by
using a better scheme for measuring the smoothness of the received image, and uses the Side Match
scheme to further decrease the error rate of extracted bits.
IRJET - Direct Me-Nevigation for Blind PeopleIRJET Journal
This document describes a system for direct navigation assistance for blind people using object detection and audio cues. It uses a convolutional neural network model called You Only Look Once (YOLO) to perform real-time object detection on camera images and then describes the detected objects and their locations to the blind user using 3D spatialized sound. The system aims to allow blind users to independently navigate environments by audibly identifying surrounding objects. It analyzes previous works on sensory substitution and assistive technologies for the blind, as well as research on using 3D sound for navigation assistance. The document outlines the object detection methods used, including YOLO and anchor boxes to improve accuracy at detecting multiple objects within each image grid.
This document summarizes a research paper on background subtraction techniques for motion detection in video. It describes a proposed technique that stores and compares past pixel values to the current value to determine if a pixel belongs to the background or foreground. It also discusses using a k-means algorithm and Gaussian mixture model to build a probabilistic background model and classify pixels. The paper evaluates different shadow detection approaches and finds RGB color spaces perform best for segmentation and shadow removal.
This document describes MMRetrieval.net, a multimodal search engine that allows retrieval of information from multiple modalities including text and images. It discusses challenges in multimodal retrieval like defining modality, fusing scores across modalities, and improving efficiency through two-stage retrieval. MMRetrieval.net was developed by researchers at DUTH to participate in the ImageCLEF 2010 Wikipedia retrieval task, where it achieved the second best performance overall by combining text and visual features through various fusion techniques. The system demonstrates the promise of multimodal search but also identifies open challenges around modality definitions, fusion methods, and scaling to large databases.
A CONCERT EVALUATION OF EXEMPLAR BASED IMAGE INPAINTING ALGORITHMS FOR NATURA...cscpconf
Image inpainting derives from restoration of art works, and has been applied to repair ancient
art works. Inpainting is a technique of restoring a partially damaged or occluded image in an
undetectable way. It fills the damaged part of an image by employing information of the
undamaged part according to some rules to make it look “reasonable” to human eyes. Digital
image inpainting is relatively new area of research, but numerous and different approaches to
tackle the inpainting problem have been proposed since the concept was first introduced. This
paper analyzes and compares the recent exemplar based inpainting algorithms by Minqin Wang
and Hao Guo et al. A number of examples on real images are demonstrated to evaluate the
results of algorithms using Peak Signal to Noise Ratio (PSNR)
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
Outlier detection is very interesting, useful and challenging problem in the field of data mining. Because of
sparse data clustering algorithm which are based on distance will not work to find outliers in spatial data.
Problem of finding irregular feature in spatial data need to be explore. Many existing approaches have
been proposed to overcome the problem of outlier detection in spatial Geographic data. In this paper an
efficient clustering and density based outlier detection framework has been proposed. The process of
outlier detection has been categorized into two steps in the first step data has been clustered together based
on any density based DBSCAN algorithm and in the second stage outlier detection is performed using LOF.
The purpose is to perform clustering and outlier mining simultaneously to improve feasibility of framework.
To verify the efficiency and robustness of proposed method, comparative study of proposed approach and
several existing approaches are presented in detail, various simulation results demonstrate the
effectiveness of the proposed approach.
Segmentation - based Historical Handwritten Word Spotting using document-spec...Konstantinos Zagoris
Many word spotting strategies for the modern documents are not directly applicable to historical handwritten documents due to writing styles variety and intense degradation. In this paper, a new method that permits effective word spotting in handwritten documents is presented that relies upon document-specific local features which take into account texture information around representative keypoints. Experimental work on two historical handwritten datasets using standard evaluation measures shows the improved performance achieved by the proposed methodology.
MINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATUREijcsit
There are two defects in WOW. One is image feature is not considered when hiding information through minimal distortion path and it leads to high total distortion. Another is total distortion grows too rapidly with hidden capacity increasing and it leads to poor anti-detection when hidden capacity is large. To solve these two problems, a new algorithm named MDIS was proposed. MDIS is also based on the minimizing additive distortion framework of STC and has the same distortion function with WOW. The feature that there are a large number of pixels, having the same value with one of their eight neighbour pixels and the mechanism of secret sharing are used in MDIS, which can reduce the total distortion, improve the antidetection and increase the value of PNSR. Experimental results showed that MDIS has better invisibility, smaller distortion and stronger anti-detection than WOW.
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)Konstantinos Zagoris
H-KWS 2014 is the Handwritten Keyword Spotting Competition organized in conjunction with ICFHR 2014 conference. The main objective of the competition is to record current advances in keyword spotting algorithms using established performance evaluation measures frequently encountered in the information retrieval literature. The competition comprises two distinct tracks, namely, a segmentation-based and a segmentation- free track. Five (5) distinct research groups have participated in the competition with three (3) methods for the segmentation- based track and four (4) methods for the segmentation-free track. The benchmarking datasets that were used in the contest contain both historical and modern documents from multiple writers. In this paper, the contest details are reported including the evaluation measures and the performance of the submitted methods along with a short description of each method.
The efficiency and quality of a feature descriptor are critical to the user experience of many computer vision applications. However, the existing descriptors are either too computationally expensive to achieve real-time performance, or not sufficiently distinctive to identify correct matches from a large database with various transformations. In this paper, we propose a highly efficient and distinctive binary descriptor, called local difference binary (LDB). LDB directly computes a binary string for an image patch using simple intensity and gradient difference tests on pair wise grid cells within the patch. A multiple-gridding strategy and a salient bit-selection method are applied to capture the distinct patterns of the patch at different spatial granularities. Experimental results demonstrate that compared to the existing state-of-the-art binary descriptors, primarily designed for speed, LDB has similar construction efficiency, while achieving a greater accuracy and faster speed for mobile object recognition and tracking tasks.
Computer Vision: Visual Extent of an ObjectIOSR Journals
The document discusses analyzing the visual extent of objects in images using computer vision techniques. It performs two analyses: 1) Without knowing object locations, it determines which image parts contribute most to object classification. 2) Assuming known object locations, it evaluates the potential of object vs. surround and object interior vs. border. The analyses find that object classification performance improves significantly when the object location is known. Descriptors from the object interior contribute more than those from the border. Knowing object locations allows separating relevant from irrelevant image regions.
Textual information in images constitutes a very rich source of high-level semantics for retrieval and indexing. In this paper, a new approach is proposed using Cellular Automata (CA) which strives towards identifying scene text on natural images. Initially, a binary edge map is calculated. Then, taking advantage of the CA flexibility, the transition rules are changing and are applied in four consecutive steps resulting in four time steps CA evolution. Finally, a post-processing technique based on edge projection analysis is employed for high density edge images concerning the elimination of possible false positives. Evaluation results indicate considerable performance gains without sacrificing text detection accuracy.
Text extraction using document structure features and support vector machinesKonstantinos Zagoris
This document presents a bottom-up text localization technique that uses document structure features and support vector machines. The technique detects and extracts text from document images through several steps: preprocessing, locating and merging blocks, extracting features from blocks, and using support vector machines trained on the features to classify blocks as containing text or not. The technique uses a flexible feature descriptor based on structural elements that can adapt to different document image types. Experimental results on document images show a 98.5% success rate in classifying blocks.
This document presents a novel method for recognizing two-dimensional QR barcodes using texture feature analysis and neural networks. It first extracts texture features like mean, standard deviation, smoothness, skewness and entropy from divided blocks of barcode images. These features are then used to train a neural network to classify blocks as containing a barcode or not. The trained neural network can then be used to locate barcodes in unknown images by classifying each block. The method is implemented and evaluated using MATLAB on a database of QR code images, showing satisfactory recognition results.
This document summarizes an article that proposes a new image steganography technique using discrete wavelet transform. The technique applies an adaptive pixel pair matching method from the spatial domain to the frequency domain. Data is embedded in the middle frequencies of the discrete wavelet transformed image because they are more robust to attacks than high frequencies. The coefficients in the low frequency sub-band are preserved unchanged to improve image quality. The experimental results showed better performance with discrete wavelet transform compared to the spatial domain.
A system was developed able to retrieve specific documents from a document collection. In this system the query is given in text by the user and then transformed into image. Appropriate features were in order to capture the general shape of the query, and ignore details due to noise or different fonts. In order to demonstrate the effectiveness of our system, we used a collection of noisy documents and we compared our results with those of a commercial OCR package.
International Journal of Research in Engineering and Science is an open access peer-reviewed international forum for scientists involved in research to publish quality and refereed papers. Papers reporting original research or experimentally proved review work are welcome. Papers for publication are selected through peer review to ensure originality, relevance, and readability.
This document discusses automatic image annotation using weakly supervised graph propagation. It begins with an abstract describing the method, which assigns annotated labels to semantically derived image regions using training images, pre-assigned labels, and input images. Graph construction considers consistency and incongruity relationships between image patches. Label propagation aims to propagate labels from images to patches using these relationships. Evaluation shows the proposed method achieves higher accuracy than baselines on several datasets by leveraging contextual information between image regions.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Binarization of Degraded Text documents and Palm Leaf ManuscriptsIRJET Journal
This document proposes a technique for binarizing degraded text documents and palm leaf manuscripts. It involves taking the average pixel value of the image as a threshold to distinguish foreground from background. The algorithm first computes the average value of the original image and uses it to set pixels above the threshold to black, removing background. It then computes the average of the remaining image, excluding black pixels, and uses that value as a new threshold to set remaining pixels above it to white, extracting the foreground. The technique is tested on old documents and manuscripts, showing improvement over existing methods based on metrics like peak signal-to-noise ratio. While effective for documents, it needs improvement for palm leaf manuscripts with non-uniform degradation.
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...IAEME Publication
Multiple human tracking based on object detection has been a challenge due to its
complexity. Errors in object detection would be propagated to tracking errors. In this
paper, we propose a tracking method that minimizes the error produced by object
detector. We use RetinaNet as object detector and Hungarian algorithm for tracking.
The cost matrix for Hungarian algorithm is calculated using the RetinaNet features,
bounding box center distances, and intersection of unions of bounding boxes. We
interpolate the missing detections in the last step. The proposed method yield 43.2
MOTA for MOT16 benchmark
AN ENHANCED SEPARABLE REVERSIBLE DATA HIDING IN ENCRYPTED IMAGES USING SIDE M...Editor IJMTER
This paper proposes a scheme for Enhanced Separable Reversible Data Hiding in
Encrypted images Using Side Match. In the first step the original image is encrypted using an
encryption key. Then additional data is embedded into the image by modifying a small portion of the
encrypted image using a data hiding key. With an encrypted image containing additional data, if a
receiver has the data hiding key, he can extract the additional data. If the receiver has the encryption
key, he can decrypt the image, but cannot extract the additional data. If the receiver has both the data
hiding key and encryption key, he can extract the additional data and recover the original content by
exploiting the spatial correlation in natural images. The accuracy of data extraction is improved by
using a better scheme for measuring the smoothness of the received image, and uses the Side Match
scheme to further decrease the error rate of extracted bits.
IRJET - Direct Me-Nevigation for Blind PeopleIRJET Journal
This document describes a system for direct navigation assistance for blind people using object detection and audio cues. It uses a convolutional neural network model called You Only Look Once (YOLO) to perform real-time object detection on camera images and then describes the detected objects and their locations to the blind user using 3D spatialized sound. The system aims to allow blind users to independently navigate environments by audibly identifying surrounding objects. It analyzes previous works on sensory substitution and assistive technologies for the blind, as well as research on using 3D sound for navigation assistance. The document outlines the object detection methods used, including YOLO and anchor boxes to improve accuracy at detecting multiple objects within each image grid.
Colour Object Recognition using Biologically Inspired Modelijsrd.com
This document presents a biologically inspired model for color object recognition. The model extracts features in the YCbCr color space and classifies objects using a support vector machine classifier. The model consists of four computational layers (S1, C1, S2, C2) that mimic the visual cortex. Features are extracted by convolving images with log-Gabor filters and pooling responses. Prototype image patches are also used. The model was tested on image datasets and achieved a classification accuracy of 91.3% for objects in the Cb color plane.
Fuzzy Logic based Edge Detection Method for Image Processing IJECEIAES
Edge detection is the first step in image recognition systems in a digital image processing. An effective way to resolve many information from an image such depth, curves and its surface is by analyzing its edges, because that can elucidate these characteristic when color, texture, shade or light changes slightly. Thiscan lead to misconception image or vision as it based on faulty method. This work presentsa new fuzzy logic method with an implemention. The objective of this method is to improve the edge detection task. The results are comparable to similar techniques in particular for medical images because it does not take the uncertain part into its account.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Semi-Supervised Method of Multiple Object Segmentation with a Region Labeling...sipij
Efficient and efficient multiple object segmentation is an important task in computer vision and object recognition. In this work; we address a method to effectively discover a user’s concept when multiple objects of interest are involved in content based image retrieval. The proposed method incorporate a framework for multiple object retrieval using semi-supervised method of similar region merging and flood fill which models the spatial and appearance relations among image pixels. To improve the effectiveness of similarity based region merging we propose a new similarity based object retrieval. The users only need to roughly indicate the after which steps desired objects contour is obtained during the automatic merging of similar regions. A novel similarity based region merging mechanism is proposed to guide the merging process with the help of mean shift technique and objects detection using region labeling and flood fill. A region R is merged with its adjacent regions Q if Q has highest similarity with Q (using Bhattacharyya descriptor) among all Q’s adjacent regions. The proposed method automatically merges the regions that are initially segmented through mean shift technique, and then effectively extracts the object contour by merging all similar regions. Extensive experiments are performed on 12 object classes (224 images total) show promising results.
Implementation of Object Tracking for Real Time VideoIDES Editor
Real-time tracking of object boundaries is an
important task in many vision applications. Here we propose
an approach to implement the level set method. This approach
does not need to solve any partial differential equations (PDFs),
thus reducing the computation dramatically compared with
optimized narrow band techniques proposed before. With our
approach, real-time level-set based video tracking can be
achieved.
This document summarizes Derek Foster's honors thesis titled "The Role of Spatial Vs Spectral Information in Hyperspectral Image Classification". The thesis examines using both spectral dimensionality reduction and spatial regularization algorithms to achieve high classification accuracy while reducing computational costs when classifying pixels in hyperspectral images. Dimensionality reduction techniques like band selection can reduce data dimensionality but also lower classification accuracy. The thesis aims to show this reduction in accuracy can be offset by also applying a spatial regularization algorithm that considers neighboring pixel information before classification. Two such algorithms, EMP and HSeg, are evaluated on two datasets to test the combination of spectral and spatial techniques. The results and conclusions of this evaluation are discussed.
This document summarizes a research paper that proposes an algorithm for detecting brain tumors in MRI images based on analyzing bilateral symmetry. The algorithm first performs preprocessing like smoothing and contrast enhancement. It then identifies the bilateral symmetry axis of the brain. Next, it segments the image into symmetric regions, enhancing asymmetric edges that may indicate a tumor. Experiments showed the algorithm can automatically detect tumor positions and boundaries. The algorithm leverages the fact that brain MRI of a healthy person is nearly bilaterally symmetric, while a tumor disrupts this symmetry.
This document is the presentation slides for a technical seminar on Self Organizing Maps. It introduces self-organizing maps as an unsupervised neural network technique. The presentation covers topics such as statistical pattern recognition, unsupervised pattern classification, advantages and disadvantages of self-organizing maps, and concludes that self-organizing maps provide a structural representation of input data.
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
This document provides a comprehensive analysis of imbalance problems in object detection. It presents a taxonomy to classify different types of imbalances and discusses solutions proposed in literature. The analysis highlights significant gaps including existing imbalances that require further attention, as well as entirely new imbalances that have never been addressed before. A survey of imbalance problems caused by weather conditions and common object imbalances is conducted. Methods for addressing imbalances include data augmentation using GANs and balancing training based on class performance.
This document discusses clustering of uncertain data objects. It first provides background on clustering uncertain data and challenges in doing so. It then proposes combining k-means clustering with Voronoi diagrams to improve the performance of k-means when clustering uncertain data. Specifically, it suggests using k-means to generate clusters and Voronoi diagrams to answer nearest neighbor queries, in order to minimize computation time. Finally, it concludes that integrating clustering algorithms with indexing methods can effectively cluster uncertain data objects.
This document discusses clustering of uncertain data objects. It first provides background on clustering uncertain data and challenges involved. It then reviews various existing approaches for clustering uncertain data, including using soft classifiers and probabilistic databases. The document proposes combining k-means clustering with Voronoi diagrams and indexing techniques to improve the performance and efficiency of clustering uncertain datasets. It outlines a plan to integrate k-means with Voronoi diagrams and indexing to reduce execution time and increase clustering performance and results for uncertain data. Finally, it concludes that combining clustering with indexing approaches can better handle uncertain data clustering challenges.
A UTILIZATION OF CONVOLUTIONAL MATRIX METHODS ON SLICED HIPPOCAMPAL NEURON RE...ijscai
Present methodologies for cell segmentation on hippocampal neuron regions contain excess information
leading to the creation of unwanted noise. To distinctly draw boundaries around the cells in each of the
channels like DAPI, Cy5, TRITC, FITC, it is pertinent to start off by denoising the present data and
cropping the relevant ROI for analysis to remove excess background information. Present edge detection
methodologies like Canny Edge Detection create black and white outputs. It is difficult to accurately do
edge detection with color throughout an entire image. As such, we utilized a more involved approach that
uses pixel level comparisons to determine the existence of an edge points. By extrapolating all the available
edge points, our algorithms are able to detect general edges throughout an imagine. To streamline the
process, it has been accompanied with a GUI interface which allows for freehand crops. This information
is stored in a downloadable txt file, which provides the necessary input for the thresholding and final
cropping. Together, the interface works to create clean data which is ready for further analysis with
algorithms likes FRCNN and YOLOv3
A UTILIZATION OF CONVOLUTIONAL MATRIX METHODS ON SLICED HIPPOCAMPAL NEURON RE...ijscai
Present methodologies for cell segmentation on hippocampal neuron regions contain excess information leading to the creation of unwanted noise. To distinctly draw boundaries around the cells in each of the channels like DAPI, Cy5, TRITC, FITC, it is pertinent to start off by denoising the present data and cropping the relevant ROI for analysis to remove excess background information. Present edge detection methodologies like Canny Edge Detection create black and white outputs. It is difficult to accurately do edge detection with color throughout an entire image. As such, we utilized a more involved approach that uses pixel level comparisons to determine the existence of an edge points. By extrapolating all the available edge points, our algorithms are able to detect general edges throughout an imagine. To streamline the process, it has been accompanied with a GUI interface which allows for freehand crops. This information is stored in a downloadable txt file, which provides the necessary input for the thresholding and final cropping. Together, the interface works to create clean data which is ready for further analysis with algorithms likes FRCNN and YOLOv3.
A UTILIZATION OF CONVOLUTIONAL MATRIX METHODS ON SLICED HIPPOCAMPAL NEURON RE...ijscai
The document describes algorithms developed for cell segmentation in hippocampal neuron images. A GUI was created to allow cropping of regions of interest from images. Convolutional matrix methods were used for edge detection without converting images to grayscale. Algorithms were also developed for denoising images by thinning borders, removing small points, and strengthening pixel intensities. The algorithms successfully generated contours around neurons in sample images stained with DAPI, Cy5, TRITC, and FITC dyes.
IRJET - Object Detection using Hausdorff DistanceIRJET Journal
This document proposes using Hausdorff distance for object detection as it can better handle noise compared to other methods like Euclidean distance. The document discusses preprocessing images using Gaussian filtering for noise cancellation. It then represents shapes as point sets for feature extraction before using Hausdorff distance to match shapes between reference and test images for object recognition. Encouraging results were obtained when testing on MNIST, COIL and private handwritten digit datasets.
Similar to Detection of Dense, Overlapping, Geometric Objects (20)
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
PhotoQR: A Novel ID Card with an Encoded Viewgerogepatton
There is an increasing interest in developing techniques to identify and assess data to allow an easy and
continuous access to resources, services or places that require thorough ID control. Usually, in order to
give access to these resources, different kinds of documents are mandatory. In order to avoid forgeries
without the need of extra credentials, a new system –named photoQR, is here proposed. This system is
based on a ID card having two objects: one person’s picture (pre-processed via blur and/or swirl
techniques) and one QR code containing embedded data related to the picture. The idea is that the picture
and the QR code can assess each other by a proper hash value in the QR. The QR without the picture
cannot be assessed and vice versa. An open source prototype of the photoQR system has been implemented
in Python and can be used both in offline and real-time environments, which effectively combines security
concepts and image processing algorithms to obtain data assessment.
12th International Conference of Artificial Intelligence and Fuzzy Logic (AI ...gerogepatton
12th International Conference of Artificial Intelligence and Fuzzy Logic (AI & FL 2024) provides a
forum for researchers who address this issue and to present their work in a peer-reviewed forum. Authors
are solicited to contribute to the conference by submitting articles that illustrate research results, projects,
surveying works and industrial experiences that describe significant advances in the following areas, but
are not limited to these topics only.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
10th International Conference on Artificial Intelligence and Applications (AI 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Artificial Intelligence and its applications. The Conference looks for significant contributions to all major fields of the Artificial Intelligence, Soft Computing in theoretical and practical aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
May 2024 - Top 10 Read Articles in Artificial Intelligence and Applications (...gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)gerogepatton
3rd International Conference on Artificial Intelligence Advances (AIAD 2024) will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area advanced Artificial Intelligence. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the research area. Core areas of AI and advanced multi-disciplinary and its applications will be covered during the conferences.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Information Extraction from Product Labels: A Machine Vision Approachgerogepatton
This research tackles the challenge of manual data extraction from product labels by employing a blend of
computer vision and Natural Language Processing (NLP). We introduce an enhanced model that combines
Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) in a Convolutional
Recurrent Neural Network (CRNN) for reliable text recognition. Our model is further refined by
incorporating the Tesseract OCR engine, enhancing its applicability in Optical Character Recognition
(OCR) tasks. The methodology is augmented by NLP techniques and extended through the Open Food
Facts API (Application Programming Interface) for database population and text-only label prediction.
The CRNN model is trained on encoded labels and evaluated for accuracy on a dedicated test set.
Importantly, our approach enables visually impaired individuals to access essential information on
product labels, such as directions and ingredients. Overall, the study highlights the efficacy of deep
learning and OCR in automating label extraction and recognition.
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
10th International Conference on Artificial Intelligence and Applications (AI 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Artificial Intelligence and its applications. The Conference looks for significant contributions to all major fields of the Artificial Intelligence, Soft Computing in theoretical and practical aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Research on Fuzzy C- Clustering Recursive Genetic Algorithm based on Cloud Co...gerogepatton
Aiming at the problems of poor local search ability and precocious convergence of fuzzy C-cluster
recursive genetic algorithm (FOLD++), a new fuzzy C-cluster recursive genetic algorithm based on
Bayesian function adaptation search (TS) was proposed by incorporating the idea of Bayesian function
adaptation search into fuzzy C-cluster recursive genetic algorithm. The new algorithm combines the
advantages of FOLD++ and TS. In the early stage of optimization, fuzzy C-cluster recursive genetic
algorithm is used to get a good initial value, and the individual extreme value pbest is put into Bayesian
function adaptation table. In the late stage of optimization, when the searching ability of fuzzy C-cluster
recursive genetic is weakened, the short term memory function of Bayesian function adaptation table in
Bayesian function adaptation search algorithm is utilized. Make it jump out of the local optimal solution,
and allow bad solutions to be accepted during the search. The improved algorithm is applied to function
optimization, and the simulation results show that the calculation accuracy and stability of the algorithm
are improved, and the effectiveness of the improved algorithm is verified
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
10th International Conference on Artificial Intelligence and Soft Computing (...gerogepatton
10th International Conference on Artificial Intelligence and Soft Computing (AIS 2024) will
provide an excellent international forum for sharing knowledge and results in theory, methodology, and
applications of Artificial Intelligence, Soft Computing. The Conference looks for significant
contributions to all major fields of the Artificial Intelligence, Soft Computing in theoretical and practical
aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from
both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
Employee attrition refers to the decrease in staff numbers within an organization due to various reasons.
As it has a negative impact on long-term growth objectives and workplace productivity, firms have
recognized it as a significant concern. To address this issue, organizations are increasingly turning to
machine-learning approaches to forecast employee attrition rates. This topic has gained significant
attention from researchers, especially in recent times. Several studies have applied various machinelearning methods to predict employee attrition, producing different resultsdepending on the employed
methods, factors, and datasets. However, there has been no comprehensive comparative review of multiple
studies applying machine-learning models to predict employee attrition to date. Therefore, this study aims
to fill this gap by providing an overview of research conducted on applying machine learning to predict
employee attrition from 2019 to February 2024. A literature review of relevant studies was conducted,
summarized, and classified. Most studies agree on conducting comparative experiments with multiple
predictive models to determine the most effective one.From this literature survey, the RF algorithm and
XGB ensemble method are repeatedly the best-performing, outperforming many other algorithms.
Additionally, the application of deep learning to employee attrition prediction issues also shows promise.
While there are discrepancies in the datasets used in previous studies, it is notable that the dataset
provided by IBM is the most widely utilized. This study serves as a concise review for new researchers,
facilitating their understanding of the primary techniques employed in predicting employee attrition and
highlighting recent research trends in this field. Furthermore, it provides organizations with insight into
the prominent factors affecting employee attrition, as identified by studies, enabling them to implement
solutions aimed at reducing attrition rates.
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
10th International Conference on Artificial Intelligence and Applications (AIFU 2024) is a forum for presenting new advances and research results in the fields of Artificial Intelligence. The conference will bring together leading researchers, engineers and scientists in the domain of interest from around the world. The scope of the conference covers all theoretical and practical aspects of the Artificial Intelligence.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Detection of Dense, Overlapping, Geometric Objects
1. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
DOI: 10.5121/ijaia.2020.11403 29
DETECTION OF DENSE, OVERLAPPING,
GEOMETRIC OBJECTS
Adele Peskin1
, Boris Wilthan1
and Michael Majurski2
1
NIST, 325 Broadway, Boulder, CO, USA
2
NIST, 100 Bureau Dr., Gaithersburg, MD, USA
ABSTRACT
Using a unique data collection, we are able to study the detection of dense geometric objects in image data
where object density, clarity, and size vary. The data is a large set of black and white images of
scatterplots, taken from journals reporting thermophysical property data of metal systems, whose plot
points are represented primarily by circles, triangles, and squares. We built a highly accurate single class
U-Net convolutional neural network model to identify 97 % of image objects in a defined set of test images,
locating the centers of the objects to within a few pixels of the correct locations. We found an optimal way
in which to mark our training data masks to achieve this level of accuracy. The optimal markings for object
classification, however, required more information in the masks to identify particular types of geometries.
We show a range of different patterns used to mark the training data masks, and how they help or hurt our
dual goals of location and classification. Altering the annotations in the segmentation masks can increase
both the accuracy of object classification and localization on the plots, more than other factors such as
adding loss terms to the network calculations. However, localization of the plot points and classification of
the geometric objects require different optimal training data.
KEYWORDS
Object detection, Classification, Convolutional Neural Network, Geometric shapes.
1. INTRODUCTION
We were given a collection of thousands of images of plots from old journal articles whose data
points were published in pictorial form only and tasked with extracting the location of data points
on the plots, so that a facsimile of the raw data can be recovered for additional analysis. The
images portray points on the plots with a variety of different geometric markers, circles, triangles,
squares, etc., both filled and open versions. It is important to be able to capture the precise
locations of these markers, in order to collect this data, not available in any other form.
Accurately detecting and localizing these plot markers is different from other object detection
problems in several ways. The images are black and white. There is no subtle gray-scale
information for a network to utilize. The plot marks are inconsistent in their clarity and exact
shape. For example, an open circle could have a well-defined circular shape on its outer
boundary, but the inner boundary could be more distorted. The shapes of the objects are all very
identifiable and repeatable, so we are only dealing with a very limited geometric scope, and the
size of the geometric objects located in these images is well defined within a range of 30-60
pixels. Finally, many of plots contain very dense patches of these geometric objects, so any
method we use will need to be robust enough to handle small, overlapping circles, squares, and
triangles.
2. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
30
2. RELATED WORK
The detection of dense objects in images has been handled using two basic approaches. The first
approach generates large sets of possible object locations, in the form of boundary boxes or
sliding windows. The bounding boxes can be defined sparsely, classified as either foreground or
background, and then used to classify objects in a two-step approach (as in R-CNN [1] or Fast R-
CNN [2]), or defined densely where the network simultaneously assigns a probability of
containing an object and the classification of the object in a single step (YOLO [3,4], and SSD
[5]). These methods provide a series of solutions which maximize accuracy and/or speed of
computation. The output of the network contains information about the location, size, and
classification of objects in images.
For dense object detection, different approaches have been used to deal with the problem of
overlapping bounding boxes. One approach, discussed in [6], is to add an extra layer to the
network (a soft intersection-over-union layer) to provide information on the quality of the
detection boxes.
A second path for object detection produces a semantic segmentation of an image, where the
output is an image, with each pixel labeled as to its classification. U-Net is an example of this
type of convolutional network, which has been used very successfully for a wide range of
biomedical image segmentations. U-Net architecture consists of a contracting path, in which
spatial information is reduced while feature information is increased, and an expanding path,
which combines feature and spatial information, using up-convolutions and concatenations with
features of the contracting path [7]. For these networks, the locations of the objects in the images
is either output as part of the network in addition to pixel labeling [8] or obtained as a result of
processing the output of the network, as in panoptic segmentation, which combines both semantic
and instance segmentations [9].
Adding different loss functions into the evaluation of each forward pass of many of these
networks has improved the quality of their outcomes. Lin [10] used a focal loss term derived for
dense object detection, which adds weight to the relative loss for objects that are harder to
classify. In our work, classification problems were not related to specific geometries. Ribera [8]
used a loss term that adds weight to the relative loss for pixels based on their distance to objects
within the image, reducing contributions from unwanted background pixels.
We approached this problem from the two directions described above, using a YOLOv3 (you
only look once) model, which uses bounding boxes [3,4], and a U-Net model, which performs
semantic segmentation [7]. Our required accuracy for detecting plot points had to be at least as
high as manual detection, within 5 pixels of the actual location on a plot size of several thousand
pixels per side.
The YOLO model we used was limited to only a small set of bounding box anchor sizes, given
the size range of the objects. However, even when reducing the size of the sub-image for the
network search to an 8x8 pixel region, which increased the accuracy of locating the points, it was
not enough to meet the 5-pixel requirement. Our best results with the YOLO model located the
objects with 10-15 pixels of the manually selected locations, not close enough to satisfy our goal.
In addition, we saw a consistently high false positive rate for object detection. The output of a
YOLO model contains probabilities that an object lies within each bounding box, and we could
not find an adequate threshold to separate the true and false positives. Our U-Net model gave
much higher location point accuracy and had a very low false positive detection rate, and we will
focus on that model in this paper.
3. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
31
2.1. TRC Data
Our data collection is supplied by the Thermodynamics Research Center (TRC) at the National
Institute of Standards and Technology, which curates a database of experimental thermophysical
property data for metals systems (https://trc.nist.gov/metals_data). All data are derived from
experiments reported in the open literature dating from the early 1900’s to very recent
publications. While it would be most convenient if the authors had reported tabulated data, often
the results are only presented in graphical format and the data must be extracted with digitization
tools manually.
This has been done for thousands of plots of varying quality – from figures of very recent papers
in a pdf format to hand drawn graphs on grid paper and many levels in between, often also
distorted and degraded by low quality scans and photocopies. TRC staff categorized the plots into
different groups, single data sets, multiple data sets, with legend, without legend, separated data
points, overlapping data points, etc., to provide a comprehensive training data set. The varying
clarity of the figures posed many challenges. Some circles appear extremely circular, others do
not have enough pixels to fully define their shape. Many of the plots have objects that are
separated, and others have masses of overlapping, dense sets of objects. Our approach for object
detection successfully covered a wide range of image qualities.
In addition to the actual data points there were also many other symbols an algorithm identifies as
data points if not seen in broader context. Text, numbers, and symbols in a legend can cause
many false positive results. To eliminate those issues a subset of figures was selected to undergo
an extra cleanup step where all misleading symbols were manually removed. Along with each
plot, we were given coordinates of the points in picture coordinates.
The raw images are RGB images, mostly black and white, but some have yellowed backgrounds.
We applied the following morphological operations to create training and test sets for the
networks: (1) convert to grayscale(gs) intensity using the following formula (gs = 0.2989 * red +
0.5870 * green + 0.1140 * blue); (2) binarize grayscale images with a manually defined threshold
of 140: all pixels < 140 are converted to 0 and all pixels over 140 are converted to 255; (3) crop
images between the minimum and maximum x and y values (as read from the plot points in the
images) with a 200 pixel buffer in both directions; (4) break each plot down into overlapping sub-
images of 512x512 pixels to feed into the network. Sub-images overlap by 30 %. We chose 512
so that a group of overlapping objects which are on the larger size can appear in the same sub-
image.
2.2. U-Net model and segmentation masks
Our initial U-Net network was modeled after the approach of [8], which adds a loss function to
the network to remove output other than the objects of interest. An advantage of this model is that
it is designed for small objects and for overlapping objects. We found that the accuracy of
locating the points was similar with or without the loss function given in [8].
Our U-Net model is shown in Figure 1. Since U-Net requires labeled masks as input, we convert
each plot mark location into a set of the non-zero pixels in the reference mask. We evaluated
several different sets of pixels used to represent each plot mark in the mask. Section 3 describes
our attempt to optimize the localization of point objects on the plots in our data using different
size labelled masks (3.1) and then to optimize the classification of different geometric objects
within the plots in addition to finding the points (3.2).
4. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
32
Figure 1. U-Net model architecture showing the different convolutional layers (blue arrows) and their
respective levels. Each blue box is a multi-channel feature map with the channel count denoted on top of
the box and the spatial dimension at the lower left edge
3. OBJECT LOCATION AND CLASSIFICATION MODELS
3.1. Plot point locations
The most frequent geometry contained in our data set is an open circle, and we have many plots
containing only this one type of icon. To first look just at locating plot points, separate from
classifying objects by geometries, we collected a set of these plots and annotated the training data
to see how accurately we can locate the open circles on the plots. The center of each circle was
marked on the training data by a small circle that varied in diameter. We used a set of 30 of these
plots, breaking them into overlapping sub-images for training and validation, and an additional 20
plots for testing. The fact that each image (and its accompanying spreadsheet of data points) had
to be individually edited and checked limited the number of images we included in our set. We
performed some data cleaning on these raw images: using the brush functionality in ImageJ [11],
we removed any axis numbers or other written information that looked like any of the geometric
objects. All that remained on the plots were the plot axes, plot points, and any lines or curves in
the plot that represented interpolation of the data. The data is augmented for training by flipping
each sub-image that contains plot points first horizontally and then vertically. (Axes markers will
be recognized as objects in the final output of our model.)
Figure 2 shows an example of marking the training masks and the resulting inferenced mask
marks. In the figure, the outcome of the network is overlaid onto the plot. The outcome of the
network resembles the labeling of the training data: there are clusters of pixels of the same
approximate size on the input mask. Test location points are found by collecting the overall mask
5. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
33
from network outcomes, finding the clusters of marked pixels in the outcomes, and finding the
centroid point of each cluster, using Python image processing libraries. Pixel clusters less than 5
pixels are eliminated. The remaining center point locations are tested against the original
manually collected list. Table 1 shows our results. Using a smaller mask marking helps to
localize the points, but too small a marking results in a model that cannot find all the plot points.
The 5-pixel radius circle marking gave the best overall results.
Table 1. Two-class model results for 275 plot points from 30 different plots. Columns correspond with
numbers of objects found or not found by the model, and the sample mean and standard deviation (mean
dist. and std dist.) of the distances in pixels from the manually labeled locations.
Mask
Radius,
pixels
Correctly
identified
Not
found
Mean
dist.
Stdv
dist.
2 232 43 2.05 0.83
5 262 13 2.07 1.01
10 226 49 2.89 1.04
Figure 2. a: Outcome of the two-class model shown by overlaying the output masks on the original plot.
Taken from [12]. The insert shows a closeup of a portion of the center of the plot (marked on the plot). The
dots in the center of each circle display the outcome of the model. b: Example of training data: left: a
512x512 section of the plot from [12]; right: the corresponding annotation for network training.
6. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
34
Figure 3. Four different types of segmentation masks used: A.) small center marks, B.) larger center marks,
C.) small center mark with a 1-pixel outline of the object, D.) small center mark with a 2-pixels outline of
the object. All markings are shown in gray, with circles shown in black.
3.2. Geometric Object Classification
3.2.1. Three-class model: open and filled circles
The segmentation masks that work well for locating plot points result in large classification errors
when we extend our model to several different geometric objects. As the simplest case we show a
three-class model, open circles, filled circles, and background. We selected 45 plots that
contained either open circles, filled circles, or both. Of the 45 plots, 14 had only open circles, 12
had only filled circles, and 19 had both. The mean height of the images was 3133 and the mean
width 3420. The mean number of plot points per image was 76.4. They were edited as before to
exclude axis numbers resembling circles and divided into overlapping 512x512 sub-images. In
most cases, the fraction of sub-images with no geometric objects (plot points) is more than half.
The total number of training and validation sub-images was 6033, 80% of which were used for
training. Training masks were created by placing a circle of radius 5 (optimal size for point
location) at the known positions on the images. In a sample test image with 107 circles, 30 filled
and 77 open, the location of the points was found with high accuracy (mean error 1.15 pixels off),
but the classification errors were high: 16 of the 77 open circles were classified as closed circles.
All of the filled circles were classified correctly.
We designed a set of different segmentation markings to create the training data, in order to see
how much more, or what different information was needed for our network to classify the
different circles. Figure 3 shows some of the masks that we used. In terms of post-processing, a
simpler mask is preferable if that can achieve our goals. Smaller center markings are easier to
separate in the inferenced images. But to provide more information to the network about the
location of each object, we created more extensive markings, with larger center marks or outlines
of each geometry with different thicknesses, 1 pixel and 2 pixels. A very thin outline could be
eroded away in the inferenced image and might be helpful for supplying extra information to the
network to differentiate between the geometric objects. When post-processing inferenced images,
using annotations with the mask B, we included a method to erode the outlines and to find
multiple center points when those larger center marks overlapped.
7. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
35
Using the same 45 plots as above, we created four different classification models to classify open
vs. filled circles, with the 4 different types of masks shown in Figure 3. In general, smaller
masks, both smaller for the central radial mark and thinner circular outline, led to higher correct
classification levels; however, the addition of the outer marking did not change the classification
rates in separating open from filled circles. The larger segmentation masks were not needed to
distinguish between open and closed geometries and produced reduced accuracy in finding the
location points. Table 2 shows outcomes from three sample test images: one with open circles
only, one with closed circles only, and one with open and filled circles together. The test images
are pictured together in Figure 4.
Figure 4.Test images for open vs. filled circle classifications. Top: open circles only [12], Middle: filled
circles only [13], Bottom: both open and filled circles [14].
The first test image is the same image as in Figure 2 with 112 open circles. All circles found in
the image were classified correctly as open circles, except in the case of the larger center
markings (5-pixel radius, mask B in Figure 3), where 52 of the 78 open circles were incorrectly
classified as filled circles. The second test image contains 121 filled circles. The classification
worked well (all circles that were found were classified correctly). The third image contains 78
open and 30 filled circles. The network trained with larger mask marks (B in Figure 3) had a
large number of open circles that were classified as closed circles (52 of the 78). All of the other
models had correct classifications of open and filled circles. Smaller markings were essential to
classify the open circles, with little difference in the size or presence of the outer marking. It was
only the open circles that did not classify correctly; the varied shape of the inner white space
makes those objects more complex than the filled circles.
8. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
36
Table 2. Results from classification of open/filled circles with four different segmentation masks: A=small
dots, B=larger dots, C=small dots 1-pixel outline, D=small dots 2-pixel outline (as shown Figure 4). Shown
are number of objects correctly and incorrectly classified, the number of objects not found by the model,
and the sample mean and standard deviation (mean dist. and stdv dist.) of the distances in pixels from the
manual locations. For the third image, we also show number of false positives, which did not occur on the
other images.
a. Inferencing on image with 112 open circles.
Correct Incorrect Not
found
Mean
dist.
Stdv
dist.
A 109 0 3 1.32 0.68
B 34 77 1 1.40 0.91
C 109 0 6 1.84 2.20
D 106 0 3 1.35 1.16
b. Inferencing on image with 121 filled circles.
Correct Incorrect Not
found
Mean
dist.
Stdv
dist.
A 117 0 4 1.71 1.43
B 117 0 4 2.41 2.12
C 117 0 4 1.95 1.56
D 114 0 7 2.22 2.29
c. Inferencing on image with 78 open and 30 filled circles
Correct Incorrect Not
found
False
pos.
Mean
dist.
Stdv.
dist.
A 107 0 1 2 1.19 0.85
B 55 52 1 0 1.63 1.93
C 105 2 1 1 3.55 0.54
D 107 0 1 1 7.65 2.45
3.2.2. Three-class model: open circles and open squares
To classify objects of different shape, we started with the example of open circles and open
squares. The training set for this group is smaller, as there are fewer examples. We used 14
images of plots with circles and squares, which produced 2676 sub-images and ran four models
using the four different types of masks. Here the results varied depending upon which mask was
used for training: more information in the mask (using outer boundaries) led to better object
classification. We show here our best and worst results, one with very sharp objects in the images
for which the classifications were good and one with circles and squares that are more
ambiguous, which led to less accurate classification. Figure 5 shows both plots [15][16].
9. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
37
Figure 5. Two different images of plots used to test our circle/square classifications. Below each plot is a
magnified section of each that shows the ambiguity of some of the circles and squares in the second plot
Table 3a displays the results of inferencing the models on the top image in Figure 5, which
contains 43 circles and squares. Overall, different segmentation masks lead to the best point
localization vs. the best object classification. For object location, the best model is again made
with the smallest marking on the masks. For classification, however, the best model has a small
marking at the object center but includes an outer marking in the shape of either the circle or the
square. Mask A, with the very small 2-pixel radius marking, has the highest error rate in
distinguishing between the circles and the squares, even in a very clear plot image, although it
was useful to find the point locations. A combination of models is needed here to achieve both
optimal point localization and object classification.
Table 3. Results from classification of open circles and open squares, with four different segmentation
masks: A=small dots, B=larger dots, C=small dots 1-pixel outline, D=small dots 2-pixel outline (as shown
Figure 4). Shown are number of objects correctly and incorrectly classified, the number of objects not
found by the model, and sample mean and standard deviation (mean dist. and stdv dist.) of the distances in
pixels from the manual locations.
a. Results from best image, top of Figure 6
Correct Incorrect Missing Mean
dist.
Stdv.
dist.
A 26 16 1 1.98 1.03
B 43 0 0 2.05 1.77
C 43 0 0 3.50 4.18
D 42 1 0 6.01 5.13
10. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
38
b. Results from worst image, bottom of Figure 6
Correct Incorrect Missing Mean
dist.
Stdv.
dist.
A 41 10 18 2.43 1.17
B 44 11 14 2.28 1.01
C 42 13 14 4.97 5.36
D 59 6 4 5.47 5.31
3.2.3. Four-class model: open circles, open squares, open triangles
Using each of the types of segmentation masks in Figure 3, we trained multi-class U-Net models
with four classes: open circles, triangles, squares, and background. We took 32 different plot
images containing open circles, open triangles, and open squares, and created a set of training and
validation data. It contained 6615 512x512 sub-images, 80% or 5292 used for training. Of the 32
plots, 11 contained circles, triangles, and squares, 18 only 2 of the 3 geometries, and 2 plots
contained only squares, which were the least frequent in the other plots. The mean height of the
images was 3737 and the mean width 4164. The mean number of plot points per image was 48.8.
The results show that our more complicated segmentation masks greatly increased the accuracy
of our object classification, although the accuracy of locating the plot points decreased somewhat.
The output was tested against several test images containing all three types of geometric objects,
with groups of overlapping objects in each test image. An example test image is shown in Figure
6, taken from [17]. It contains 27 circles, 43 triangles, and 36 squares. Using our original marking
on the segmentation masks (the radius=2 circles), only 70 of the 106 objects were successfully
classified, 14 were not found in the output, 20 were classified incorrectly (see Table 4). However,
for each object that was located, the location was found precisely; the average location error was
2.37 pixels, with a standard deviation of 1.61.
Figure 6. Test images for four-class model; taken from [17]. The upper right corner shows a magnification
of the bottom left corner of the plot as marked, displaying some of the overlapping geometric objects.
Enlarging the central mark of each object in the masks of the training set decreased the number of
objects not found at all by the models. Adding an additional outer circle, triangle, or square
increased that number correctly classified. The results of the classifications from all 4 models,
derived from the four different types of masks, inferencing the image in Figure 4 are also given in
Table 4. Some of the other test images we used gave better results than those in Table 4, but the
general trends were the same: the original masks with the small dots consistently had numerous
11. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
39
objects that were not found and incorrect classifications. The number of objects not found
decreased consistently using any of the masks except the original one, and the addition of the
outline in the segmentation masks increased the number of correct classifications (although they
made the post-processing a little harder). Errors in locating the centers of the geometric objects,
however, were lower with masks that did not contain the outlines of the geometries. The insert of
the magnified plot in Figure 4 shows examples of the overlapping geometric objects and the
small deviations from pure geometric shapes.
Table 4. Four-class model results for 106 objects on a plot from [17]. Columns correspond with numbers of
objects correctly and incorrectly classified, numbers not found by the model, and the sample mean and
standard deviation (mean dist. and stdv dist.) of the distances in pixels from the manual locations.
mask Correctly
classified
Incorrectly
Classified
Not
found
Mean
dist.
Stdv dist.
Small dot 70 20 16 2.37 1.61
Larger dot 88 13 5 2.60 2.32
thin outline 82 18 6 4.95 4.49
thicker outline 94 8 4 5.76 4.81
Because the training sets with different types of masks led to different outcomes in the inferenced
images, we can use the best aspects of several of the resulting models to get the best classification
and smallest errors in locating the points of the plots. Use of the thicker outlines for the circles,
triangles, and squares, in general led to the highest rates of correct classification, but also the
highest location errors. The best results were found by using the models including segmentation
outlines for the classification, in conjunction with the outcomes of the models without the
outlines to find the correct locations; a good estimate for the locations and good classifications
are both achieved.
4. CONCLUSIONS
We experimented with a U-Net network to find the locations of geometric objects in plot images.
Our goal was to locate the plot points within 5 pixels of the ground truth data, and were
successful locating objects with this accuracy. The model was sufficient to detect the variety of
geometric objects in our collection, but not sufficient in every case to distinguish between some
of our geometries. Putting more information into the image masks that the network uses for
learning improved the performance of our classification of different types of geometric objects.
The more complicated masks led to less accurate point locations on the plots, however.
Distinguishing between filled and open versions of the same type of object, filled vs. open circles
for example, did not benefit from marking the outer boundaries of the circles in the training
masks. Filled circles had a much higher correct classification rate than open circles, since they do
not have the ambiguity of an inner circle shape.
We had hoped that the classification of objects in these plots would have a very high accuracy,
which we only see when inferencing plots whose plot points are represented by clear circles,
triangles, and squares, whose shapes are not at all ambiguous due to the clarity of the image.
There is quite a bit of variation in the shapes of individual circles, triangles, and squares. As usual
with these models, the more we understood about the variation in our training data, the better the
model we were able to produce. Perhaps because of the simple geometry of the objects and the
two-toned gray scale of the images, we found that our localization was as accurate as was
required without the use of loss functions previously used to add more weight to the background
of images or to classes hard to classify.
12. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
40
REFERENCES
[1] . R. Girshick, J. Donahue, T. Darrell, & J. Malik, (2014) “Rich feature hierarchies for accurate object
detection and semantic segmentation”, IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pp580-587.
[2] S. Ren, K. He, R. Girshick, & J. Sun. Faster R-CNN, (2015) “Towards real-time object detection with
region proposal networks”, Advances in Neural Information Processing Systems 28. Curran
Associates, Inc., pp91-99.
[3] J. Redmon, S. Divvala, R., & A. Farhadi, (2015) “You Only Look Once: Unified, real-time object
detection”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp779-788.
[4] J. Redmon & A. Farhadi, (2018) “YOLOv3: an incremental improvement”, arXiv preprint
arXiv:1804.02767.
[5] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, & S. Reed, (2016), “SSD: Single shot multibox
detector”, European Conference on Computer Vision (ECCV), pp21-37, 2016.
[6] E Goldman, R. Herzig, A. Eisenschtat, J. Goldberger & T. Hassner (2019), “Precise detection in
densely packed scenes”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
pp5227-5236, 2019.
[7] O. Ronneberger, P. Fischer & T. Brox, (2015) “U-Net: Convolutional networks for biomedical image
segmentation”, Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp234-
241.
[8] J. Ribera, D. Guera, Y. Chen & E. Delp, (2019) “Locating objects without bounding boxes”’ IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pp6479-6489.
[9] A. Kirillov, H. Kaiming, R. Girshick, C. Rother & P. Dollar, (2019) “Panoptic segmentation”, IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pp9404-9413.
[10] T. Lin, P. Goyal, R. Girshick, K. He & P.Dollar, 92017) “Focal loss for dense object detection”, IEEE
International Conference on Computer Vision (ICCV), pp2980-2988.
[11] C. Schneider, W. Rasband & K. Eliceiri, (2012) “NIH Image to ImageJ: 25 years of image analysis”,
Nature methods, Vol.9, No. 7, pp671-675.
[12] S. Novikova, (1960) “Thermal expansion of germanium at low temperatures”, Soviet
Physics - Solid State, Vo.l 2, pp37-38.
[13] M. Kehr, W. Hoyer & I. Egry. A new high-temperature oscillating cup viscometer. Intl. J.
Thermophysics, Vol. 28, pp1017-1025, 2007.
[14] D. Detwiler & H. Fairbank, (1952) “The thermal resistivity of superconductors”, Phys. Rev. Vol. 88,
pp1049-1052.
[15] J. Rayne, (1956) “The heat capacity of copper below 4.2K”, Australian J. Physics.Vol. 9, pp189-917.
[16] K. Aziz, A. Schmon & G. Pottlacher, (2015) “Measurement of surface tension of liquid nickel by the
osciallating drop technique. High Temperatures – High Pressures”, Vol. 44, pp475-481.
[17] G. White & S. Woods, (1955) “Thermal and electrical conductivities of solids at low temperatures”,
Canadian J. Physics, Vol. 33, pp58-73.