Evolution of visual object recognition architectures based on Convolutional Neural Networks &
Convolutional Deep Belief Networks paradigms has revolutionized artificial Vision Science.
These architectures extract & learn the real world hierarchical visual features utilizing
supervised & unsupervised learning approaches respectively. Both the approaches yet cannot
scale up realistically to provide recognition for a very large number of objects as high as 10K.
We propose a two level hierarchical deep learning architecture inspired by divide & conquer
principle that decomposes the large scale recognition architecture into root & leaf level model
architectures. Each of the root & leaf level models is trained exclusively to provide superior
results than possible by any 1-level deep learning architecture prevalent today. The proposed
architecture classifies objects in two steps. In the first step the root level model classifies the
object in a high level category. In the second step, the leaf level recognition model for the
recognized high level category is selected among all the leaf models. This leaf level model is
presented with the same input object image which classifies it in a specific category. Also we
propose a blend of leaf level models trained with either supervised or unsupervised learning
approaches. Unsupervised learning is suitable whenever labelled data is scarce for the specific
leaf level models. Currently the training of leaf level models is in progress; where we have
trained 25 out of the total 47 leaf level models as of now. We have trained the leaf models with
the best case top-5 error rate of 3.2% on the validation data set for the particular leaf models.
Also we demonstrate that the validation error of the leaf level models saturates towards the
above mentioned accuracy as the number of epochs are increased to more than sixty. The top-5
error rate for the entire two-level architecture needs to be computed in conjunction with the
error rates of root & all the leaf models. The realization of this two level visual recognition
architecture will greatly enhance the accuracy of the large scale object recognition scenarios
demanded by the use cases as diverse as drone vision, augmented reality, retail, image search
& retrieval, robotic navigation, targeted advertisements etc.
Image Captioning Generator using Deep Machine Learningijtsrd
Technologys scope has evolved into one of the most powerful tools for human development in a variety of fields.AI and machine learning have become one of the most powerful tools for completing tasks quickly and accurately without the need for human intervention. This project demonstrates how deep machine learning can be used to create a caption or a sentence for a given picture. This can be used for visually impaired persons, as well as automobiles for self identification, and for various applications to verify quickly and easily. The Convolutional Neural Network CNN is used to describe the alphabet, and the Long Short Term Memory LSTM is used to organize the right meaningful sentences in this model. The flicker 8k and flicker 30k datasets were used to train this. Sreejith S P | Vijayakumar A "Image Captioning Generator using Deep Machine Learning" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42344.pdf Paper URL: https://www.ijtsrd.comcomputer-science/artificial-intelligence/42344/image-captioning-generator-using-deep-machine-learning/sreejith-s-p
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
The advents in this technological era have resulted into enormous pool of information. This information is
stored at multiple places globally, in multiple formats. This article highlights a methodology for extracting
the video lectures delivered by experts in the domain of Computer Science by using Generalized Gamma
Mixture Model. The feature extraction is based on the DCT transformations. In order to propose the model,
the data set is pooled from the YouTube video lectures in the domain of Computer Science. The outputs
generated are evaluated using Precision and Recall.
Deep learning algorithms have drawn the attention of researchers working in the field of computer vision, speech
recognition, malware detection, pattern recognition and natural language processing. In this paper, we present an overview of
deep learning techniques like Convolutional neural network, deep belief network, Autoencoder, Restricted Boltzmann machine
and recurrent neural network. With this, current work of deep learning algorithms on malware detection is shown with the
help of literature survey. Suggestions for future research are given with full justification. We also showed the experimental
analysis in order to show the importance of deep learning techniques.
Image Captioning Generator using Deep Machine Learningijtsrd
Technologys scope has evolved into one of the most powerful tools for human development in a variety of fields.AI and machine learning have become one of the most powerful tools for completing tasks quickly and accurately without the need for human intervention. This project demonstrates how deep machine learning can be used to create a caption or a sentence for a given picture. This can be used for visually impaired persons, as well as automobiles for self identification, and for various applications to verify quickly and easily. The Convolutional Neural Network CNN is used to describe the alphabet, and the Long Short Term Memory LSTM is used to organize the right meaningful sentences in this model. The flicker 8k and flicker 30k datasets were used to train this. Sreejith S P | Vijayakumar A "Image Captioning Generator using Deep Machine Learning" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42344.pdf Paper URL: https://www.ijtsrd.comcomputer-science/artificial-intelligence/42344/image-captioning-generator-using-deep-machine-learning/sreejith-s-p
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
The advents in this technological era have resulted into enormous pool of information. This information is
stored at multiple places globally, in multiple formats. This article highlights a methodology for extracting
the video lectures delivered by experts in the domain of Computer Science by using Generalized Gamma
Mixture Model. The feature extraction is based on the DCT transformations. In order to propose the model,
the data set is pooled from the YouTube video lectures in the domain of Computer Science. The outputs
generated are evaluated using Precision and Recall.
Deep learning algorithms have drawn the attention of researchers working in the field of computer vision, speech
recognition, malware detection, pattern recognition and natural language processing. In this paper, we present an overview of
deep learning techniques like Convolutional neural network, deep belief network, Autoencoder, Restricted Boltzmann machine
and recurrent neural network. With this, current work of deep learning algorithms on malware detection is shown with the
help of literature survey. Suggestions for future research are given with full justification. We also showed the experimental
analysis in order to show the importance of deep learning techniques.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...sipij
The resolution of an image is a very important criterion for evaluating the quality of the image. Higher resolution of image is always preferable as images of lower resolution are unsuitable due to fuzzy quality. Higher resolution of image is important for various fields such as medical imaging; astronomy works and so on as images of lower resolution becomes unclear and indistinct when their sizes are enlarged. In recent times, various research works are performed to generate higher resolution of an image from its lower resolution. In this paper, we have proposed a technique of generating higher resolution images form lower resolution using Residual in Residual Dense Block network architecture with a deep network. We have also compared our method with other methods to prove that our method provides better visual quality images.
A Novel Visual Cryptographic Scheme Using Floyd Steinberg Half Toning and Block Replacement Algorithms Nisha Menon K – PG Scholar,
Minu Kuriakose – Assistant Professor,
Department of Electronics and Communication,
Federal Institute of Science and Technology, Ernakulam, India
PREVENTING COPYRIGHTS INFRINGEMENT OF IMAGES BY WATERMARKING IN TRANSFORM DOM...ijistjournal
Images are undoubtedly the most efficacious and easiest means of communicating an idea. They are surely an indispensable part of human life .The trend of sharing images of various kinds for example typical technical figures, modern exceptional masterpiece from an artist, photos from the recent picnic to hill station etc, on the internet is spreading like a viral. There is a mandatory requirement for checking the privacy and security of our personal digital images before making them public via the internet. There is always a threat of our original images being illegally reproduced or distributed elsewhere. To prevent the misuse and protect the copyrights, an efficient solution has been given that can withstand many attacks. This paper aims at encoding of the host image prior to watermark embedding for enhancing the security. The fast and effective full counter propagation neural network helps in the successful watermark embedding without deteriorating the image perception. Earlier techniques embedded the watermark in the image itself but is has been observed that synapses of neural network provide a better platform for reducing the distortion and increasing the message capacity.
MEANINGFUL AND UNEXPANDED SHARES FOR VISUAL SECRET SHARING SCHEMESijiert bestjournal
In today�s internet world it is very essential to secretly share biometric data stored in the central database. There are so many options to secretly share biometri c data using cryptographic computation. This work reviews and applies a perfectly secure method to secretly share biometric data,for possible use in biometric authentication and protection based on conc ept of visual cryptography. The basic concept of proposed approach is to secretly share private imag e into two meaningful and unexpanded shares (sheets) that are stored in two separate database servers such that decryption can be performed only when both shares are simultaneously available;at the same ti me,the individual share do not open identity of the private image. Previous research,such as Arun Ross et al. in 2011,was using pixel expansion for encryption,which causes the waste of storage space and transmission time. Furthermore,some researcher such as Hou and Quan�s research in 2011,producing m eaningless shares,which causes visually revealing existence of secret image. In this work,we review visual cryptography scheme and apply them to secretly share biometric data such as fingerprint,face images for the purpose of user authentication. So,using this technique we can secretly share biometric data over internet and only authorized user can decrypt the information.
Image Steganography Based On Non Linear Chaotic AlgorithmIJARIDEA Journal
Abstract— Late inquires about of picture steganography have been progressively in light of clamorous frameworks, yet the disadvantages of little key space and powerless security in one-dimensional disorderly cryptosystems are self-evident. This paper presents steganography with nonlinear riotous calculation (NCA) which utilizes control capacity and digression work rather than straight capacity. Its basic parameters are acquired by test examination. The message with the key is then consolidated with the cover picture utilizing LSB installing and discrete cosine change.
Keywords— Discrete Cosine Transform, LSB Embedding, Nonlinear Chaotic Algorithm, Steganography, Stego Image.
Robust Malware Detection using Residual Attention NetworkShamika Ganesan
In this paper, we explore the use of an attention based mechanism known as Residual Attention for malware detection and compare this with existing CNN based methods and conventional Machine Learning algorithms with the help of GIST features. The proposed method outperformed traditional malware detection methods which use Machine Learning and CNN based Deep Learning algorithms, by demonstrating an accuracy of 99.25%.
This paper has been accepted in the International Conference of Consumer Electronics (ICCE 2021).
IMPACT OF ERROR FILTERS ON SHARES IN HALFTONE VISUAL CRYPTOGRAPHYcscpconf
Visual cryptography encodes a secret binary image (SI) into shares of random binary patterns.
If the shares are xeroxed onto transparencies, the secret image can be visually decoded by
superimposing a qualified subset of transparencies, but no secret information can be obtained
from the superposition of a forbidden subset. The binary patterns of the shares, however, have
no visual meaning and hinder the objectives of visual cryptography. Halftone visual
cryptography encodes a secret binary image into n halftone shares (images) carrying significant
visual information. When secrecy is important factor rather than the quality of recovered image
the shares must be of better visual quality. Different filters such as Floyd-Steinberg, Jarvis,
Stuki, Burkes, Sierra, and Stevenson’s-Arce are used and their impact on visual quality of
shares is seen. The simulation shows that error filters used in error diffusion lays a great impact on the visual quality of the shares.
A NOVEL PERCEPTUAL IMAGE ENCRYPTION SCHEME USING GEOMETRIC OBJECTS BASED KERNELijcsit
The wide use of digital images and videos in various applications warrant serious attention to the security and privacy issues today. Several encryption techniques have been proposed in recent years as feasible solutions to the protection of digital images and videos. In many applications, such as pay-per-view videos,pay-TV and video on demand, one of the required features is that the quality of the video data be degraded only partially by some encryption technique and the encrypted data must still be partially perceptible. This feature referred to as ‘Perceptual encryption’ is the encryption algorithm that degrades the quality of media content according to security or quality requirements. In this work we propose a simple yet efficient technique for realizing perceptual encryption using geometric objects as kernels based on which the pixels are permuted. Confusion aspect that is required is realized by inserting the kernel on the image and thereby performing transposition of pixels based on the kernel formed out of geometric objects. The various parameters of geometric objects, number of objects and the position of the objects/kernel in the image are used as the key for encryption and later on for decryption. Further a choice of quality of the image required i.e., different levels of degradation is provided by adjusting the above parameters of the objects/kernel.From the results obtained it is evident that the proposed method which is more apt for perceptual encryption can also be used effectively for full image encryption with acceptable level of security.
Steganography using Coefficient Replacement and Adaptive Scaling based on DTCWTCSCJournals
Steganography is an authenticated technique for maintaining secrecy of embedded data. Steganography provides hardness of detecting the hidden data and has a potential capacity to hide the existence of confidential data. In this paper, we propose a novel steganography using coefficient replacement and adaptive scaling based on Dual Tree Complex Wavelet Transform (DTCWT) technique. The DTCWT and LWT 2 is applied on cover image and payload respectively to convert spatial domain into transform domain. The HH sub band coefficients of cover image are replaced by the LL sub band coefficients of payload to generate intermediate stego object and the adaptive scaling factor is used to scale down intermediate stego object coefficient values to generate final stego object. The adaptive scaling factor is determined based on entropy of cover image. The security and the capacity of the proposed method are high compared to the existing algorithms.
A Novel Technique for Image Steganography Based on DWT and Huffman EncodingCSCJournals
Image steganography is the art of hiding information into a cover image. This paper presents a novel technique for Image steganography based on DWT, where DWT is used to transform original image (cover image) from spatial domain to frequency domain. Firstly two dimensional Discrete Wavelet Transform (2-D DWT) is performed on a gray level cover image of size M × N and Huffman encoding is performed on the secret messages/image before embedding. Then each bit of Huffman code of secret message/image is embedded in the high frequency coefficients resulted from Discrete Wavelet Transform. Image quality is to be improved by preserving the wavelet coefficients in the low frequency sub-band. The experimental results show that the algorithm has a high capacity and a good invisibility. Moreover PSNR of cover image with stego-image shows the better results in comparison with other existing steganography approaches. Furthermore, satisfactory security is maintained since the secret message/image cannot be extracted without knowing decoding rules and Huffman table.
Steganography is the technique of hiding the fact that communication is taking place,
by hiding data in other data. Many different carrier file formats can be used, but digital images
are the most popular because of their frequency on the Internet. For hiding secret information in
images, there exist a large variety of steganographic techniques. Steganalysis, the detection of this
hidden information, is an inherently difficult problem.In this paper,I am going to cover different
steganographic techniques researched by different researchers.
Keywords — Cryptography, Steganography, LSB, Hash-LSB, RSA Encryption –Decryption
COMPRESSION BASED FACE RECOGNITION USING DWT AND SVMsipij
The biometric is used to identify a person effectively and employ in almost all applications of day to day
activities. In this paper, we propose compression based face recognition using Discrete Wavelet Transform
(DWT) and Support Vector Machine (SVM). The novel concept of converting many images of single person
into one image using averaging technique is introduced to reduce execution time and memory. The DWT is
applied on averaged face image to obtain approximation (LL) and detailed bands. The LL band coefficients
are given as input to SVM to obtain Support vectors (SV’s). The LL coefficients of DWT and SV’s are fused
based on arithmetic addition to extract final features. The Euclidean Distance (ED) is used to compare test
image features with database image features to compute performance parameters. It is observed that, the
proposed algorithm is better in terms of performance compared to existing algorithms.
This paper describes a novel system for vectorizing 2D raster cartoon. The output videos are the resolution independent, smaller in file size. As a first step, input video is segment to scene thereafter all processes are done for each scene separately. Every scene contains foreground and background objects so in each and every scene foreground background classification is performed. Background details can occlude by foreground objects but when foreground objects move its previous position such occluded details exposed in one of the next frame so using that frame can fill the occluded area and can generate static background. Classified foreground objects are identified and the motion of the foreground objects tracked for this simple user assistance is required from those motion details of foreground object’s animation generated. Static background and foreground objects segmented using K-means clustering and each and every cluster’s vectorized using potrace. Using vectored background and foreground object animation path vector video regenerated.
Image steganography using least significant bit and secret map techniques IJECEIAES
In steganography, secret data are invisible in cover media, such as text, audio, video and image. Hence, attackers have no knowledge of the original message contained in the media or which algorithm is used to embed or extract such message. Image steganography is a branch of steganography in which secret data are hidden in host images. In this study, image steganography using least significant bit and secret map techniques is performed by applying 3D chaotic maps, namely, 3D Chebyshev and 3D logistic maps, to obtain high security. This technique is based on the concept of performing random insertion and selecting a pixel from a host image. The proposed algorithm is comprehensively evaluated on the basis of different criteria, such as correlation coefficient, information entropy, homogeneity, contrast, image, histogram, key sensitivity, hiding capacity, quality index, mean square error (MSE), peak signal-to-noise ratio (PSNR) and image fidelity. Results show that the proposed algorithm satisfies all the aforementioned criteria and is superior to other previous methods. Hence, it is efficient in hiding secret data and preserving the good visual quality of stego images. The proposed algorithm is resistant to different attacks, such as differential and statistical attacks, and yields good results in terms of key sensitivity, hiding capacity, quality index, MSE, PSNR and image fidelity.
A new cryptosystem with four levels of encryption and parallel programmingcsandit
Evolution in the communication systems has changed the paradigm of human life on this planet.
The growing network facilities for the masses have converted this world to a village (or may be
even smaller entity of human accommodation) in a sense that every part of the world is
reachable for everyone in almost no time. But this fact is also not an exception for coins having
two sides. With increasing use of communication networks the various threats to the privacy,
integrity and confidentiality of the data sent over the network are also increasing, demanding
the newer and newer security measures to be implied. The ancient techniques of coded
messages are imitated in terms of new software environments under the domain of
cryptography. The cryptosystems provide a means for the secured transmission of data over an
unsecured channel by providing encoding and decoding functionalities. This paper proposes a
new cryptosystem based on four levels of encryption. The system is suitable for communication
within the trusted groups.
A comparative analysis of retrieval techniques in content based image retrievalcsandit
Basic group of visual techniques such as color, shape, texture are used in Content Based Image
Retrievals (CBIR) to retrieve query image or sub region of image to find similar images in
image database. To improve query result, relevance feedback is used many times in CBIR to
help user to express their preference and improve query results. In this paper, a new approach
for image retrieval is proposed which is based on the features such as Color Histogram, Eigen
Values and Match Point. Images from various types of database are first identified by using
edge detection techniques .Once the image is identified, then the image is searched in the
particular database, then all related images are displayed. This will save the retrieval time.
Further to retrieve the precise query image, any of the three techniques are used and
comparison is done w.r.t. average retrieval time. Eigen value technique found to be the best as
compared with other two techniques.
SDL Based Validation of a Node Monitoring Protocol csandit
Mobile ad hoc network is a wireless, self-configured, infrastructure less network of mobile
nodes. The nodes are highly mobile, which makes the application running on them face network
related problems like node failure, link failure, network level disconnection, scarcity of
resources, buffer degradation, and intermittent disconnection etc. Node failure and Network
fault are need to be monitored continuously by supervising the network status. Node monitoring
protocol is crucial, so it is required to test the protocol exhaustively to verify and validate the
functionality and accuracy of the designed protocol. This paper presents a validation model for
Node Monitoring Protocol using Specification and Description L language (SDL) using both
Static Agent (SA) and Mobile Agent (MA). We have verified properties of the Node Monitoring
Protocol (NMP) based on the global states with no exits, deadlock states or proper termination
states using reach ability graph. Message Sequence Chart (MSC) gives an intuitive
understanding of the described system behaviour with varying node density and complex
behaviour etc.
A mathematical model of access control in big data using confidence interval ...csandit
Nowadays, the concept of big data grows incessantly
; recent researches proved that 90% of the
whole data existed on the web had been created in l
ast two years. However, this growing
bumped by many critical challenges resides generall
y in security level; the users care about
how could providers protect their privacy on their
data. Access control, cryptography, and de-
identification are the main search areas grouped un
der a specific domain known as Privacy
Preserving Data Publishing. In this paper, we bring
in suggestion a new model for access
control over big data using digital signature and c
onfidence interval; we first introduce our
work by presenting some general concepts used to bu
ild our approach then presenting the idea
of this report and finally we evaluate our system b
y conducting several experiments and
showing and discussing the results that we got
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...sipij
The resolution of an image is a very important criterion for evaluating the quality of the image. Higher resolution of image is always preferable as images of lower resolution are unsuitable due to fuzzy quality. Higher resolution of image is important for various fields such as medical imaging; astronomy works and so on as images of lower resolution becomes unclear and indistinct when their sizes are enlarged. In recent times, various research works are performed to generate higher resolution of an image from its lower resolution. In this paper, we have proposed a technique of generating higher resolution images form lower resolution using Residual in Residual Dense Block network architecture with a deep network. We have also compared our method with other methods to prove that our method provides better visual quality images.
A Novel Visual Cryptographic Scheme Using Floyd Steinberg Half Toning and Block Replacement Algorithms Nisha Menon K – PG Scholar,
Minu Kuriakose – Assistant Professor,
Department of Electronics and Communication,
Federal Institute of Science and Technology, Ernakulam, India
PREVENTING COPYRIGHTS INFRINGEMENT OF IMAGES BY WATERMARKING IN TRANSFORM DOM...ijistjournal
Images are undoubtedly the most efficacious and easiest means of communicating an idea. They are surely an indispensable part of human life .The trend of sharing images of various kinds for example typical technical figures, modern exceptional masterpiece from an artist, photos from the recent picnic to hill station etc, on the internet is spreading like a viral. There is a mandatory requirement for checking the privacy and security of our personal digital images before making them public via the internet. There is always a threat of our original images being illegally reproduced or distributed elsewhere. To prevent the misuse and protect the copyrights, an efficient solution has been given that can withstand many attacks. This paper aims at encoding of the host image prior to watermark embedding for enhancing the security. The fast and effective full counter propagation neural network helps in the successful watermark embedding without deteriorating the image perception. Earlier techniques embedded the watermark in the image itself but is has been observed that synapses of neural network provide a better platform for reducing the distortion and increasing the message capacity.
MEANINGFUL AND UNEXPANDED SHARES FOR VISUAL SECRET SHARING SCHEMESijiert bestjournal
In today�s internet world it is very essential to secretly share biometric data stored in the central database. There are so many options to secretly share biometri c data using cryptographic computation. This work reviews and applies a perfectly secure method to secretly share biometric data,for possible use in biometric authentication and protection based on conc ept of visual cryptography. The basic concept of proposed approach is to secretly share private imag e into two meaningful and unexpanded shares (sheets) that are stored in two separate database servers such that decryption can be performed only when both shares are simultaneously available;at the same ti me,the individual share do not open identity of the private image. Previous research,such as Arun Ross et al. in 2011,was using pixel expansion for encryption,which causes the waste of storage space and transmission time. Furthermore,some researcher such as Hou and Quan�s research in 2011,producing m eaningless shares,which causes visually revealing existence of secret image. In this work,we review visual cryptography scheme and apply them to secretly share biometric data such as fingerprint,face images for the purpose of user authentication. So,using this technique we can secretly share biometric data over internet and only authorized user can decrypt the information.
Image Steganography Based On Non Linear Chaotic AlgorithmIJARIDEA Journal
Abstract— Late inquires about of picture steganography have been progressively in light of clamorous frameworks, yet the disadvantages of little key space and powerless security in one-dimensional disorderly cryptosystems are self-evident. This paper presents steganography with nonlinear riotous calculation (NCA) which utilizes control capacity and digression work rather than straight capacity. Its basic parameters are acquired by test examination. The message with the key is then consolidated with the cover picture utilizing LSB installing and discrete cosine change.
Keywords— Discrete Cosine Transform, LSB Embedding, Nonlinear Chaotic Algorithm, Steganography, Stego Image.
Robust Malware Detection using Residual Attention NetworkShamika Ganesan
In this paper, we explore the use of an attention based mechanism known as Residual Attention for malware detection and compare this with existing CNN based methods and conventional Machine Learning algorithms with the help of GIST features. The proposed method outperformed traditional malware detection methods which use Machine Learning and CNN based Deep Learning algorithms, by demonstrating an accuracy of 99.25%.
This paper has been accepted in the International Conference of Consumer Electronics (ICCE 2021).
IMPACT OF ERROR FILTERS ON SHARES IN HALFTONE VISUAL CRYPTOGRAPHYcscpconf
Visual cryptography encodes a secret binary image (SI) into shares of random binary patterns.
If the shares are xeroxed onto transparencies, the secret image can be visually decoded by
superimposing a qualified subset of transparencies, but no secret information can be obtained
from the superposition of a forbidden subset. The binary patterns of the shares, however, have
no visual meaning and hinder the objectives of visual cryptography. Halftone visual
cryptography encodes a secret binary image into n halftone shares (images) carrying significant
visual information. When secrecy is important factor rather than the quality of recovered image
the shares must be of better visual quality. Different filters such as Floyd-Steinberg, Jarvis,
Stuki, Burkes, Sierra, and Stevenson’s-Arce are used and their impact on visual quality of
shares is seen. The simulation shows that error filters used in error diffusion lays a great impact on the visual quality of the shares.
A NOVEL PERCEPTUAL IMAGE ENCRYPTION SCHEME USING GEOMETRIC OBJECTS BASED KERNELijcsit
The wide use of digital images and videos in various applications warrant serious attention to the security and privacy issues today. Several encryption techniques have been proposed in recent years as feasible solutions to the protection of digital images and videos. In many applications, such as pay-per-view videos,pay-TV and video on demand, one of the required features is that the quality of the video data be degraded only partially by some encryption technique and the encrypted data must still be partially perceptible. This feature referred to as ‘Perceptual encryption’ is the encryption algorithm that degrades the quality of media content according to security or quality requirements. In this work we propose a simple yet efficient technique for realizing perceptual encryption using geometric objects as kernels based on which the pixels are permuted. Confusion aspect that is required is realized by inserting the kernel on the image and thereby performing transposition of pixels based on the kernel formed out of geometric objects. The various parameters of geometric objects, number of objects and the position of the objects/kernel in the image are used as the key for encryption and later on for decryption. Further a choice of quality of the image required i.e., different levels of degradation is provided by adjusting the above parameters of the objects/kernel.From the results obtained it is evident that the proposed method which is more apt for perceptual encryption can also be used effectively for full image encryption with acceptable level of security.
Steganography using Coefficient Replacement and Adaptive Scaling based on DTCWTCSCJournals
Steganography is an authenticated technique for maintaining secrecy of embedded data. Steganography provides hardness of detecting the hidden data and has a potential capacity to hide the existence of confidential data. In this paper, we propose a novel steganography using coefficient replacement and adaptive scaling based on Dual Tree Complex Wavelet Transform (DTCWT) technique. The DTCWT and LWT 2 is applied on cover image and payload respectively to convert spatial domain into transform domain. The HH sub band coefficients of cover image are replaced by the LL sub band coefficients of payload to generate intermediate stego object and the adaptive scaling factor is used to scale down intermediate stego object coefficient values to generate final stego object. The adaptive scaling factor is determined based on entropy of cover image. The security and the capacity of the proposed method are high compared to the existing algorithms.
A Novel Technique for Image Steganography Based on DWT and Huffman EncodingCSCJournals
Image steganography is the art of hiding information into a cover image. This paper presents a novel technique for Image steganography based on DWT, where DWT is used to transform original image (cover image) from spatial domain to frequency domain. Firstly two dimensional Discrete Wavelet Transform (2-D DWT) is performed on a gray level cover image of size M × N and Huffman encoding is performed on the secret messages/image before embedding. Then each bit of Huffman code of secret message/image is embedded in the high frequency coefficients resulted from Discrete Wavelet Transform. Image quality is to be improved by preserving the wavelet coefficients in the low frequency sub-band. The experimental results show that the algorithm has a high capacity and a good invisibility. Moreover PSNR of cover image with stego-image shows the better results in comparison with other existing steganography approaches. Furthermore, satisfactory security is maintained since the secret message/image cannot be extracted without knowing decoding rules and Huffman table.
Steganography is the technique of hiding the fact that communication is taking place,
by hiding data in other data. Many different carrier file formats can be used, but digital images
are the most popular because of their frequency on the Internet. For hiding secret information in
images, there exist a large variety of steganographic techniques. Steganalysis, the detection of this
hidden information, is an inherently difficult problem.In this paper,I am going to cover different
steganographic techniques researched by different researchers.
Keywords — Cryptography, Steganography, LSB, Hash-LSB, RSA Encryption –Decryption
COMPRESSION BASED FACE RECOGNITION USING DWT AND SVMsipij
The biometric is used to identify a person effectively and employ in almost all applications of day to day
activities. In this paper, we propose compression based face recognition using Discrete Wavelet Transform
(DWT) and Support Vector Machine (SVM). The novel concept of converting many images of single person
into one image using averaging technique is introduced to reduce execution time and memory. The DWT is
applied on averaged face image to obtain approximation (LL) and detailed bands. The LL band coefficients
are given as input to SVM to obtain Support vectors (SV’s). The LL coefficients of DWT and SV’s are fused
based on arithmetic addition to extract final features. The Euclidean Distance (ED) is used to compare test
image features with database image features to compute performance parameters. It is observed that, the
proposed algorithm is better in terms of performance compared to existing algorithms.
This paper describes a novel system for vectorizing 2D raster cartoon. The output videos are the resolution independent, smaller in file size. As a first step, input video is segment to scene thereafter all processes are done for each scene separately. Every scene contains foreground and background objects so in each and every scene foreground background classification is performed. Background details can occlude by foreground objects but when foreground objects move its previous position such occluded details exposed in one of the next frame so using that frame can fill the occluded area and can generate static background. Classified foreground objects are identified and the motion of the foreground objects tracked for this simple user assistance is required from those motion details of foreground object’s animation generated. Static background and foreground objects segmented using K-means clustering and each and every cluster’s vectorized using potrace. Using vectored background and foreground object animation path vector video regenerated.
Image steganography using least significant bit and secret map techniques IJECEIAES
In steganography, secret data are invisible in cover media, such as text, audio, video and image. Hence, attackers have no knowledge of the original message contained in the media or which algorithm is used to embed or extract such message. Image steganography is a branch of steganography in which secret data are hidden in host images. In this study, image steganography using least significant bit and secret map techniques is performed by applying 3D chaotic maps, namely, 3D Chebyshev and 3D logistic maps, to obtain high security. This technique is based on the concept of performing random insertion and selecting a pixel from a host image. The proposed algorithm is comprehensively evaluated on the basis of different criteria, such as correlation coefficient, information entropy, homogeneity, contrast, image, histogram, key sensitivity, hiding capacity, quality index, mean square error (MSE), peak signal-to-noise ratio (PSNR) and image fidelity. Results show that the proposed algorithm satisfies all the aforementioned criteria and is superior to other previous methods. Hence, it is efficient in hiding secret data and preserving the good visual quality of stego images. The proposed algorithm is resistant to different attacks, such as differential and statistical attacks, and yields good results in terms of key sensitivity, hiding capacity, quality index, MSE, PSNR and image fidelity.
A new cryptosystem with four levels of encryption and parallel programmingcsandit
Evolution in the communication systems has changed the paradigm of human life on this planet.
The growing network facilities for the masses have converted this world to a village (or may be
even smaller entity of human accommodation) in a sense that every part of the world is
reachable for everyone in almost no time. But this fact is also not an exception for coins having
two sides. With increasing use of communication networks the various threats to the privacy,
integrity and confidentiality of the data sent over the network are also increasing, demanding
the newer and newer security measures to be implied. The ancient techniques of coded
messages are imitated in terms of new software environments under the domain of
cryptography. The cryptosystems provide a means for the secured transmission of data over an
unsecured channel by providing encoding and decoding functionalities. This paper proposes a
new cryptosystem based on four levels of encryption. The system is suitable for communication
within the trusted groups.
A comparative analysis of retrieval techniques in content based image retrievalcsandit
Basic group of visual techniques such as color, shape, texture are used in Content Based Image
Retrievals (CBIR) to retrieve query image or sub region of image to find similar images in
image database. To improve query result, relevance feedback is used many times in CBIR to
help user to express their preference and improve query results. In this paper, a new approach
for image retrieval is proposed which is based on the features such as Color Histogram, Eigen
Values and Match Point. Images from various types of database are first identified by using
edge detection techniques .Once the image is identified, then the image is searched in the
particular database, then all related images are displayed. This will save the retrieval time.
Further to retrieve the precise query image, any of the three techniques are used and
comparison is done w.r.t. average retrieval time. Eigen value technique found to be the best as
compared with other two techniques.
SDL Based Validation of a Node Monitoring Protocol csandit
Mobile ad hoc network is a wireless, self-configured, infrastructure less network of mobile
nodes. The nodes are highly mobile, which makes the application running on them face network
related problems like node failure, link failure, network level disconnection, scarcity of
resources, buffer degradation, and intermittent disconnection etc. Node failure and Network
fault are need to be monitored continuously by supervising the network status. Node monitoring
protocol is crucial, so it is required to test the protocol exhaustively to verify and validate the
functionality and accuracy of the designed protocol. This paper presents a validation model for
Node Monitoring Protocol using Specification and Description L language (SDL) using both
Static Agent (SA) and Mobile Agent (MA). We have verified properties of the Node Monitoring
Protocol (NMP) based on the global states with no exits, deadlock states or proper termination
states using reach ability graph. Message Sequence Chart (MSC) gives an intuitive
understanding of the described system behaviour with varying node density and complex
behaviour etc.
A mathematical model of access control in big data using confidence interval ...csandit
Nowadays, the concept of big data grows incessantly
; recent researches proved that 90% of the
whole data existed on the web had been created in l
ast two years. However, this growing
bumped by many critical challenges resides generall
y in security level; the users care about
how could providers protect their privacy on their
data. Access control, cryptography, and de-
identification are the main search areas grouped un
der a specific domain known as Privacy
Preserving Data Publishing. In this paper, we bring
in suggestion a new model for access
control over big data using digital signature and c
onfidence interval; we first introduce our
work by presenting some general concepts used to bu
ild our approach then presenting the idea
of this report and finally we evaluate our system b
y conducting several experiments and
showing and discussing the results that we got
Association rule discovery for student performance prediction using metaheuri...csandit
According to the increase of using data mining tech
niques in improving educational systems
operations, Educational Data Mining has been introd
uced as a new and fast growing research
area. Educational Data Mining aims to analyze data
in educational environments in order to
solve educational research problems. In this paper
a new associative classification technique
has been proposed to predict students final perform
ance. Despite of several machine learning
approaches such as ANNs, SVMs, etc. associative cla
ssifiers maintain interpretability along
with high accuracy. In this research work, we have
employed Honeybee Colony Optimization
and Particle Swarm Optimization to extract associat
ion rule for student performance prediction
as a multi-objective classification problem. Result
s indicate that the proposed swarm based
algorithm outperforms well-known classification tec
hniques on student performance prediction
classification problem.
ICU Patient Deterioration Prediction : A Data-Mining Approachcsandit
A huge amount of medical data is generated every da
y, which presents a challenge in analysing
these data. The obvious solution to this challenge
is to reduce the amount of data without
information loss. Dimension reduction is considered
the most popular approach for reducing
data size and also to reduce noise and redundancies
in data. In this paper, we investigate the
effect of feature selection in improving the predic
tion of patient deterioration in ICUs. We
consider lab tests as features. Thus, choosing a su
bset of features would mean choosing the
most important lab tests to perform. If the number
of tests can be reduced by identifying the
most important tests, then we could also identify t
he redundant tests. By omitting the redundant
tests, observation time could be reduced and early
treatment could be provided to avoid the risk.
Additionally, unnecessary monetary cost would be av
oided. Our approach uses state-of-the-art
feature selection for predicting ICU patient deteri
oration using the medical lab results. We
apply our technique on the publicly available MIMIC
-II database and show the effectiveness of
the feature selection. We also provide a detailed a
nalysis of the best features identified by our
approach.
Efficient dispatching rules based on data mining for the single machine sched...csandit
In manufacturing the solutions found for scheduling
problems and the human expert’s
experience are very important. They can be transfor
med using Artificial Intelligence techniques
into knowledge and this knowledge could be used to
solve new scheduling problems. In this
paper we use Decision Trees for the generation of n
ew Dispatching Rules for a Single Machine
shop solved using a Genetic Algorithm. Two heuristi
cs are proposed to use the new Dispatching
Rules and a comparative study with other Dispatchin
g Rules from the literature is presented.
Implementation of a bpsk modulation based cognitive radio system using the en...csandit
We present in this work an energy detection algorit
hm, based on spectral power estimation, in
the context of cognitive radio. The algorithm is ba
sed on the Neyman-Pearson test where the
robustness of the appropriate spectral bands identi
fication, is based, at one hand, on the
‘judicious’ choice of the probability of detection
(P
D
) and false alarm probability (P
F
). First, we
accomplish a comparative study between two techniqu
es for estimation of PSD (Power Spectral
Density): the periodogram and Welch methods. Also,
the interest is focused on the choice of the
optimal duration of observation where we can state
that this latter one should be inversely
proportional to the level of the SNR of the transmi
tted signal to be sensed. The developed
algorithm is applied in the context of cognitive ra
dio. The algorithm aims to identify the free
spectral bands representing, reserved for the prima
ry user, of the signal carrying information,
issued from an ASCII encoding alphanumeric message
and utilizing the BPSK modulation,
transmitted through an AWGN (Added White Gaussian N
oise) channel. The algorithm succeeds
in identifying the free spectral bands even for low
SNR levels (e.g. to -2 dB) and allocate them
to the informative signal representing the secondar
y user.
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment csandit
In general, a distributed processing is not suitable for dealing with image data stream due to the
network load problem caused by communications of frames. For this reason, image data stream
processing has operated in just one node commonly. However, we need to process image data
stream in a distributed environment in a big data era due to increase in quantity and quality of
multimedia data. In this paper, we shall present a real-time pedestrian detection methodology in
a distributed environment which processes image data stream in real-time on Apache Storm
framework. It achieves sharp speed up by distributing frames onto several nodes called bolts,
each of which processes different regions of image frames. Moreover, it can reduce the
overhead caused by synchronization by computation bolts which returns only the processing
results to the merging bolts.
APPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORINGcsandit
Nowadays we can observe the change of the structure of energy resources, which leads to the increasing fraction of a renewable energy sources. Traditional underground coal mining loses its significance in a total but there are countries, including Poland, which economy is still coal
based. A decreasing coal resources imply an exploitation a becoming harder accessible coal beds what is connected with the increase of the safety of the operation. One of the most
important technical factor of the safety of underground coal mining is the diagnostic state of a longwall powered roof support. It consists of dozen (or hundreds) of units working in a row. The diagnostic state of a powered roof supports depends on the diagnostic state of all units. This
paper describes the possibility of unit diagnostic state analysis based on the biclustering methods.
OPTIMIZATION IN ENGINE DESIGN VIA FORMAL CONCEPT ANALYSIS USING NEGATIVE ATTR...csandit
There is an exhaustive study around the area of engine design that covers different methods that try to reduce costs of production and to optimize the performance of these engines.
Mathematical methods based in statistics, self-organized maps and neural networks reach the best results in these designs but there exists the problem that configuration of these methods is
not an easy work due the high number of parameters that have to be measured.
BLE-Based Accurate Indoor Location Tracking for Home and Office csandit
Nowadays the use of smart mobile devices and the accompanying needs for emerging services
relying on indoor location-based services (LBS) for mobile devices are rapidly increasing. For
more accurate location tracking using Bluetooth Low Energy (BLE), this paper proposes a
novel trilateration-based algorithm and presents experimental results that demonstrate its
effectiveness.
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONcsandit
In this paper, we present and analyze different approaches implemented here to resolve pedestrian detection problem. Histograms of Oriented Laplacian (HOL) is a descriptor of
characteristic, it aims to highlight objects in digital images, Discrete Cosine Transform DCT with its two version global (GDCT) and local (LDCT), it changes image's pixel into frequencies coefficients and then we use them as a characteristics in the process. We implemented
independently these methods and tried to combine it and used there outputs in a classifier, the new generated classifier has proved it efficiency in certain cases. The performance of those
methods and their combination is tested on most popular Dataset in pedestrian detection, which
are INRIA and Daimler.
Table-Based Identification Protocol of Compuatational RFID Tags csandit
Computation RFID (CRFID) expands the limit of traditional RFID by granting computational
capability to RFID tags. RFID tags are powered by Radio-Frequency (RF) energy harvesting.
However, CRFID tags need extra energy and processing time than traditional RFID tags to
sense the environment and generate sensing data. Therefore, Dynamic Framed Slotted ALOHA
(DFSA) protocol for traditional RFID is no longer a best solution for CRFID systems. In this
paper, we propose a table-based CRFID tag identification protocol considering CRFID
operation. An RFID reader sends the message with the frame index. Using the frame index,
CRFID tags calculate the energy requirement and processing time information to the reader
along with data transmission. The RFID reader records the received information of tags into a
table. After table recording is completed, the optimal frame size and proper time interval is
provided based on the table. The proposed CRFID tag identification protocol is shown to
enhance the identification rate and the delay via simulations.
ABOUT THE SUITABILITY OF CLOUDS IN HIGH-PERFORMANCE COMPUTINGcsandit
Cloud computing has become the ubiquitous computing and storage paradigm. It is also attractive for scientists, because they do not have to care any more for their own IT
infrastructure, but can outsource it to a Cloud Service Provider of their choice. However, for the case of High-Performance Computing (HPC) in a cloud, as it is needed in simulations or for Big Data analysis, things are getting more intricate, because HPC codes must stay highly
efficient, even when executed by many virtual cores (vCPUs). Older clouds or new standard clouds can fulfil this only under special precautions, which are given in this article. The results
can be extrapolated to other cloud OSes than OpenStack and to other codes than OpenFOAM,which were used as examples
Topic Modeling : Clustering of Deep Webpagescsandit
The internet is comprised of massive amount of information in the form of zillions of web pages.This information can be categorized into the surface web and the deep web. The existing search engines can effectively make use of surface web information.But the deep web remains unexploited yet. Machine learning techniques have been commonly employed to access deep web content.
Under Machine Learning, topic models provide a simple way to analyze large volumes of unlabeled text. A "topic" consists of a cluster of words that frequently occur together. Using
contextual clues, topic models can connect words with similar meanings and distinguish between words with multiple meanings. Clustering is one of the key solutions to organize the deep web databases.In this paper, we cluster deep web databases based on the relevance found among deep web forms by employing a generative probabilistic model called Latent Dirichlet
Allocation(LDA) for modeling content representative of deep web databases. This is implemented after preprocessing the set of web pages to extract page contents and form
contents.Further, we contrive the distribution of “topics per document” and “words per topic”
using the technique of Gibbs sampling. Experimental results show that the proposed method clearly outperforms the existing clustering methods.
Delay Tolerant Network (DTN) is a promising technology which aims to provide efficient
communication between devices in a network with no guaranteed continuous connectivity. Most
of the existing routing schemes for DTNs achieve message delivery through message replication
and forwarding. However, due to the lack of contemporaneous end-to-end communication path,
designing routing protocols that can achieve high delivery rate with low communication
overhead is a challenging problem. Some routing protocols appear with high similarity, but
their performance are significantly different. In this paper, we evaluate several popular routing
protocols in DTNs, including Epidemic, Spray and Wait, PRoPHET, and 3R through extensive
trace-driven simulations. The objective is to evaluate the performance of different routing
schemes using different data traces and investigate the optimal configuration setting for each
routing scheme. This paper provides important guidances on the design and selection of routing
protocols for given delay tolerant networks.
A comprehensive survey of link mining and anomalies detectioncsandit
This survey introduces the emergence of link mining and its relevant application to detect
anomalies which can include events that are unusual, out of the ordinary or rare, unexpected
behaviour, or outliers.
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...ijscai
Object detection and recognition are important problems in computer vision and pattern recognition
domain. Human beings are able to detect and classify objects effortlessly but replication of this ability on
computer based systems has proved to be a non-trivial task. In particular, despite significant research
efforts focused on meta-heuristic object detection and recognition, robust and reliable object recognition
systems in real time remain elusive. Here we present a survey of one particular approach that has proved
very promising for invariant feature recognition and which is a key initial stage of multi-stage network
architecture methods for the high level task of object recognition.
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...ijscai
Object detection and recognition are important problems in computer vision and pattern recognition
domain. Human beings are able to detect and classify objects effortlessly but replication of this ability on
computer based systems has proved to be a non-trivial task. In particular, despite significant research
efforts focused on meta-heuristic object detection and recognition, robust and reliable object recognition
systems in real time remain elusive. Here we present a survey of one particular approach that has proved
very promising for invariant feature recognition and which is a key initial stage of multi-stage network
architecture methods for the high level task of object recognition.
Unsupervised learning models of invariant features in images: Recent developm...IJSCAI Journal
Object detection and recognition are important problems in computer vision and pattern recognition
domain. Human beings are able to detect and classify objects effortlessly but replication of this ability on
computer based systems has proved to be a non-trivial task. In particular, despite significant research
efforts focused on meta-heuristic object detection and recognition, robust and reliable object recognition
systems in real time remain elusive. Here we present a survey of one particular approach that has proved
very promising for invariant feature recognition and which is a key initial stage of multi-stage network
architecture methods for the high level task of object recognition.
In this study, an end-to-end human iris recognition system is presented to automatically identify individuals for high level of security purposes. The deep learning technology based new 2D convolutional neural network (CNN) model is introduced for extracting the features and classifying the iris patterns. Firstly, the iris dataset is collected, preprocessed and augmented. The dataset are expanded and enhanced using data augmentation, histogram equalization (HE) and contrast-limited adaptive histogram equalization (CLAHE) techniques. Secondly, the features of the iris patterns were extracted and classified using CNN. The structure of CNN comprises of convolutional layers and ReLu layers for extracting the features, pooling layers for reducing the parameters, fully connected layer and Softmax layer for classifying the extracted features into N classes. For the training process and updating the weights, the backpropagation algorithm and adaptive moment estimation Adam optimizer are used. The experimental results carried out based on a graphics processing unit (GPU) and using Matlab. The overall training accuracy of the introduced system was 95.33% with a consumption time of 17.59 minutes for training set. While the testing accuracy 100% with a consumption time of 12 seconds. The introduced iris recognition system has been successfully applied.
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION cscpconf
This paper aims at providing insight on the transferability of deep CNN features to
unsupervised problems. We study the impact of different pretrained CNN feature extractors on
the problem of image set clustering for object classification as well as fine-grained
classification. We propose a rather straightforward pipeline combining deep-feature extraction
using a CNN pretrained on ImageNet and a classic clustering algorithm to classify sets of
images. This approach is compared to state-of-the-art algorithms in image-clustering and
provides better results. These results strengthen the belief that supervised training of deep CNN
on large datasets, with a large variability of classes, extracts better features than most carefully
designed engineering approaches, even for unsupervised tasks. We also validate our approach
on a robotic application, consisting in sorting and storing objects smartly based on clustering
Satellite Image Classification with Deep Learning Surveyijtsrd
Satellite imagery is important for many applications including disaster response, law enforcement and environmental monitoring etc. These applications require the manual identification of objects in the imagery. Because the geographic area to be covered is very large and the analysts available to conduct the searches are few, thus an automation is required. Yet traditional object detection and classification algorithms are too inaccurate and unreliable to solve the problem. Deep learning is a part of broader family of machine learning methods that have shown promise for the automation of such tasks. It has achieved success in image understanding by means that of convolutional neural networks. The problem of object and facility recognition in satellite imagery is considered. The system consists of an ensemble of convolutional neural networks and additional neural networks that integrate satellite metadata with image features. Roshni Rajendran | Liji Samuel ""Satellite Image Classification with Deep Learning: Survey"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020,
URL: https://www.ijtsrd.com/papers/ijtsrd30031.pdf
Paper Url : https://www.ijtsrd.com/engineering/computer-engineering/30031/satellite-image-classification-with-deep-learning-survey/roshni-rajendran
Content-based image retrieval based on corel dataset using deep learningIAESIJAI
A popular technique for retrieving images from huge and unlabeled image databases are content-based-image-retrieval (CBIR). However, the traditional information retrieval techniques do not satisfy users in terms of time consumption and accuracy. Additionally, the number of images accessible to users are growing due to web development and transmission networks. As the result, huge digital image creation occurs in many places. Therefore, quick access to these huge image databases and retrieving images like a query image from these huge image collections provides significant challenges and the need for an effective technique. Feature extraction and similarity measurement are important for the performance of a CBIR technique. This work proposes a simple but efficient deep-learning framework based on convolutional-neural networks (CNN) for the feature extraction phase in CBIR. The proposed CNN aims to reduce the semantic gap between low-level and high-level features. The similarity measurements are used to compute the distance between the query and database image features. When retrieving the first 10 pictures, an experiment on the Corel-1K dataset showed that the average precision was 0.88 with Euclidean distance, which was a big step up from the state-of-the-art approaches.
Scene recognition using Convolutional Neural NetworkDhirajGidde
Scene recognition is one of the hallmark tasks of computer vision, allowing definition of a context for object recognition. Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, performance at scene recognition has not attained the same level of success.
End-to-end deep auto-encoder for segmenting a moving object with limited tra...IJECEIAES
Deep learning-based approaches have been widely used in various applications, including segmentation and classification. However, a large amount of data is required to train such techniques. Indeed, in the surveillance video domain, there are few accessible data due to acquisition and experiment complexity. In this paper, we propose an end-to-end deep auto-encoder system for object segmenting from surveillance videos. Our main purpose is to enhance the process of distinguishing the foreground object when only limited data are available. To this end, we propose two approaches based on transfer learning and multi-depth auto-encoders to avoid over-fitting by combining classical data augmentation and principal component analysis (PCA) techniques to improve the quality of training data. Our approach achieves good results outperforming other popular models, which used the same principle of training with limited data. In addition, a detailed explanation of these techniques and some recommendations are provided. Our methodology constitutes a useful strategy for increasing samples in the deep learning domain and can be applied to improve segmentation accuracy. We believe that our strategy has a considerable interest in various applications such as medical and biological fields, especially in the early stages of experiments where there are few samples.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
2. 78 Computer Science & Information Technology (CS & IT)
1. INTRODUCTION
Deep learning based vision architectures learn to extract & represent visual features with model
architectures that are composed of layers of non-linear transformations stacked on top of each
other [1]. They learn high level abstractions from low level features extracted from images
utilizing supervised or unsupervised learning algorithms. Recent advances in training CNNs with
gradient descent based backpropagation algorithm have shown very accurate results due to
inclusion of rectified linear units as nonlinear transformation [2]. Also extension of unsupervised
learning algorithms that train deep belief networks towards training convolutional networks have
exhibited promise to scale it to realistic image sizes [4]. Both the supervised and unsupervised
learning approaches have matured and have provided architectures that can successfully classify
objects in 1000 & 100 categories respectively. Yet both the approaches cannot be scaled
realistically to classify objects from 10K categories.
The need for large scale object recognition is ever relevant today with the explosion of the
number of individual objects that are supposed to be comprehended by artificial vision based
solutions. This requirement is more pronounced in use case scenarios as drone vision, augmented
reality, retail, image search & retrieval, industrial robotic navigation, targeted advertisements etc.
The large scale object recognition will enable the recognition engines to cater to wider spectrum
of object categories. Also the mission critical use cases demand higher level of accuracy
simultaneously with the large scale of objects to be recognized.
In this paper, we propose a two level hierarchical deep learning architecture that achieves
compelling results to classify objects in 10K categories. To the best of our knowledge the
proposed method is the first attempt to classify 10K objects utilizing a two level hierarchical deep
learning architecture. Also a blend of supervised & unsupervised learning based leaf level models
is proposed to overcome labelled data scarcity problem. The proposed architecture provides us
with the dual benefit in the form of providing the solution for large scale object recognition and at
the same time achieving this challenge with greater accuracy than being possible with a 1-level
deep learning architecture.
2. RELATED WORKS
We have not come across any work that uses 2-level hierarchical deep learning architecture to
classify 10K objects in images. But object recognition on this large scale using shallow
architectures utilizing SVMs is discussed in [5]. This effort presents a study of large scale
categorization with more than 10K image classes using multi-scale spatial pyramids (SPM) [14]
on bag of visual words (BOW) [13] for feature extraction & Support Vector Machines (SVM) for
classification.
This work creates ten different datasets derived from ImageNet each with 200 to 10,000
categories. Based on these datasets it outlines the influence on classification accuracy due to
different factors like number of labels in a dataset, density of the dataset and the hierarchy of
labels in a dataset. The methods are proposed which provide extended information to the
classifier on the relationship between different labels by defining a hierarchical cost. This cost is
calculated as the height of the lowest common ancestor in WordNet. Classifiers trained on loss
function using the hierarchical cost can learn to differentiate and predict between similar
3. Computer Science & Information Technology (CS & IT) 79
categories when compared to those trained on 0-1 loss. The error rate for the classification of
entire 10K categories is not conclusively stated in this work.
3. PROBLEM STATEMENT
Supervised learning based deep visual recognition CNN architectures are composed of multiple
convolutional stages stacked on top of each other to learn hierarchical visual features [1] as
captured in Figure 1. Regularization approaches such as stochastic pooling, dropout, data
augmentation have been utilized to enhance the recognition accuracy. Recently the faster
convergence of these architectures is attributed to the inclusion of Rectified Linear Units [ReLU]
nonlinearity into each of the layer with weights. The state of the art top 5 error rate reported is
4.9% for classification into 1K categories [6] that utilizes the above mentioned generic
architectural elements in 22 layers with weights.
Unsupervised learning based architecture model as convolutional DBN learns the visual feature
hierarchy by greedily training layer after layer. These architectures have reported accuracy of
65.4% for classifying 101 objects [4].
Both the architectures are not yet scaled for classification of 10K objects. We conjecture that
scaling a single architecture is not realistic as the computations will get intractable if we utilize
deeper architectures.
Figure 1. Learning hierarchy of visual features in CNN architecture
4. 80 Computer Science & Information Technology (CS & IT)
4. PROPOSED METHOD
We employ divide & conquer principle to decompose the 10K classification into root & leaf level
distinct challenges. The proposed architecture classifies objects in two steps as captured below:
1. Root Level Model Architecture: In the first step the root i.e. the first level in architectural
hierarchy recognizes high level categories. This very deep vision architectural model with 14
weight layers [3] is trained using stochastic gradient descent [2]. The architectural elements are
captured in the table 1.
2. Leaf Level Model Architecture: In the second step, the leaf level recognition model for the
recognized high level category is selected among all the leaf models. This leaf level model is
presented with the same input object image which classifies it in a specific category. The leaf
level architecture in the architectural hierarchy recognizes specific objects or finer categories.
This model is trained using stochastic gradient descent [2]. The architectural elements are
captured in the table 2.
CDBN based leaf level models can be trained with unsupervised learning approach in case of
scarce labelled images [4]. This will deliver a blend of leaf models trained with supervised &
unsupervised approaches. In all a root level model & 47 leaf level models need to be trained. We
use ImageNet10K dataset [5], which is compiled from 10184 synsets of the Fall-2009 release of
ImageNet. Each leaf node has at least 200 labelled images which amount to 9M images in total.
Figure 2. Two Levels – Hierarchical Deep Learning Archtecture
5. SUPERVISED TRAINING
In vision the low level features [e.g. pixels, edge-lets, etc.] are assembled to form high level
abstractions [e.g. edges, motifs] and these higher level abstractions are in turn assembled to form
further higher level abstractions [e.g. object parts, objects] and so on. Substantial number of the
recognition models in our two-level hierarchical architecture is trained utilizing supervised
training. The algorithms utilized for this method are referred to as error-back propagation
5. Computer Science & Information Technology (CS & IT) 81
algorithm. These algorithms require significantly high number of labelled training images per
object category in its data set.
5.1. CNN based Architecture
CNN is a biologically inspired architecture where multiple trainable convolutional stages are
stacked on the top of each other. Each CNN layer learns feature extractors in the visual feature
hierarchy and attempts to mimic the human visual system feature hierarchy manifested in
different areas of human brain as V1 & V2 [10]. Eventually the fully connected layers act as a
feature classifier & learn to classify the extracted features by CNN layers into different categories
or objects. The fully connected layers can be likened to the V4 area of the brain which classifies
the hierarchical features as generated by area V2.
The root level & the leaf level CNN models in our architecture are trained with supervised
gradient descent based backpropagation method. In this learning method, the cross entropy
objective function is minimized with the error correction learning rule/mechanism. This
mechanism computes the gradients for the weight updates of the hidden layers by recursively
computing the local error gradient in terms of the error gradients of the next connected layer of
neurons. By correcting the synaptic weights for all the free parameters in the network, eventually
the actual response of the network is moved closer to the desired response in statistical sense.
Table 1. CNN architecture layers [L] with architectural elements for root level visual recognition
architecture
Layer No. Type Input Size #kernels
1 Convolutional 225 x 225 x 3 3 x 3 x 3
2 Max Pool 223 x 223 x 64 3 x 3
3 Convolutional 111 x 111 x 64 3 x 3 x 64
4 Convolutional 111 x 111 x 128 3 x 3 x 128
5 Max Pool 111 x 111 x 128 3 x 3
6 Convolutional 55 x 55 x 128 3 x 3 x 128
7 Convolutional 55 x 55 x 256 3 x 3 x 256
8 Max Pool 55 x 55 x 256 3 x 3
9 Convolutional 27 x 27 x 256 3 x 3 x 256
10 Convolutional 27 x 27 x 384 3 x 3 x 384
11 Convolutional 27 x 27 x 384 3 x 3 x 384
12 Max Pool 27 x 27 x 384 3 x 3
13 Convolutional 13 x 13 x 384 3 x 3 x 384
14 Convolutional 13 x 13 x 512 3 x 3 x 512
15 Convolutional 13 x 13 x 512 3 x 3 x 512
16 Max Pool 13 x 13 x 512 3 x 3
17 Convolutional 7 x 7 x 512 1 x 1 x 512
18 Full-Connect 12544 4096
19 Full-Connect 4096 2048
20 Full-Connect 2048 128
128-Softmax 128
5.2. Architectural Elements
Architectural elements for the proposed architecture are:
6. 82 Computer Science & Information Technology (CS & IT)
• Enhanced Discriminative Function: We have chosen deeper architectures & smaller
kernels for the root & leaf models as they make the objective function more
discriminative. This can be interpreted as making the training procedure more difficult by
making it to choose the feature extractors from higher dimensional feature space.
• ReLU Non-linearity: We have utilized ReLU nonlinearities as against sigmoidal i.e. non-
saturating nonlinearities in each layer as it reduces the training time by converging upon
the weights faster [2].
• Pooling: The output of convolutional-ReLU combination is fed to a pooling layer after
alternative convolutional layers. The output of the pooling layer is invariant to the small
changes in location of the features in the object. The pooling method used is either max-
pooling OR stochastic pooling. Max Pooling method averages the output over the
neighborhood of the neurons where-in the poling neighborhoods can be overlapping or
non-overlapping. In majority of the leaf models we have used Max Pooling approach
with overlapping neighborhoods.
Alternatively we have also used Stochastic Pooling method when training for few
models. In Stochastic Pooling the output activations of the pooling region are randomly
picked from the activations within each pooling region, following multinomial
distribution. This distribution is computed from the neuron activation within the given
region [12]. This approach is hyper parameter free. The CNN architecture for stochastic
pooling technique is captured in table 3.
• Dropout: With this method the output of each neuron in fully connected layer is set to
zero with probability 0.5. This ensures that the network samples a different architecture
when a new training example is presented to it. Besides this method enforces the neurons
to learn more robust features as it cannot rely on the existence of the neighboring
neurons.
Table 2. CNN architecture layers [L] with architectural elements for leaf level visual recognition
architecture with Max pooling strategy
Layer No. Type Input Size #kernels
1 Convolutional 225 x 225 x 3 7 x 7 x 3
2 Max Pool 111 x 111 x 64 3 x 3
3 Convolutional 55 x 55 x 64 3 x 3 x 64
4 Convolutional 55 x 55 x 128 3 x 3 x 128
5 Max Pool 55 x 55 x 128 3 x 3
6 Convolutional 27 x 27 x 128 3 x 3 x 128
7 Convolutional 27 x 27 x 256 3 x 3 x 256
8 Max Pool 27 x 27 x 256 3 x 3
9 Convolutional 13 x 13 x 256 3 x 3 x 256
10 Convolutional 13 x 13 x 384 3 x 3 x 384
12 Max Pool 13 x 13 x 384 3 x 3
13 Convolutional 7 x 7 x 384 1 x 1 x 384
14 Full-Connect 6272 2048
15 Full-Connect 2048 2048
16 Full-Connect 2048 256
256-Softmax 256
7. Computer Science & Information Technology (CS & IT) 83
Table 3. CNN architecture layers [L] with architectural elements for leaf level visual recognition
architecture with stochastic pooling strategy
Layer No. Type Input Size #kernels
1 Convolutional 225 x 225 x 3 7 x 7 x 3
2 Stochastic Pool 111 x 111 x 64 3 x 3
3 Convolutional 55 x 55 x 64 3 x 3 x 64
4 Convolutional 55 x 55 x 128 3 x 3 x 128
5 Stochastic Pool 55 x 55 x 128 3 x 3
6 Convolutional 27 x 27 x 128 3 x 3 x 128
7 Convolutional 27 x 27 x 256 3 x 3 x 256
8 Stochastic Pool 27 x 27 x 256 3 x 3
9 Convolutional 13 x 13 x 256 3 x 3 x 256
10 Convolutional 13 x 13 x 384 3 x 3 x 384
12 Stochastic Pool 13 x 13 x 384 3 x 3
13 Convolutional 7 x 7 x 384 1 x 1 x 384
14 Full-Connect 6272 2048
15 Full-Connect 2048 2048
16 Full-Connect 2048 256
256-Softmax 256
5.3. Training Process
We modified libCCV open source CNN implementation to realize the proposed architecture
which is trained on NVIDIA GTX™ TITAN GPUs. The root & leaf level models are trained
using stochastic gradient descent [2].
The leaf models are trained as batches of 10 models per GPU on the two GPU systems
simultaneously. The first 4 leaf models were initialized and trained from scratch for 15 epochs
with learning rate of 0.01, and momentum of 0.9. The rest of the leaf models are initialized from
the trained leaf models and trained with learning rate as 0.001. The root model has been trained
for 32 epochs with learning rate of 0.001, after having been initialized from a similar model
trained on ImageNet 1K dataset.
It takes 10 days for a batch of 20 leaf models to train for 15 epochs. Currently the root model and
25/47 leaf models have been trained in 5 weeks. Full realization of this architecture is in progress
and is estimated to conclude by second week of September’15
6. UNSUPERVISED TRAINING
Statistical Mechanics has inspired the concept of unsupervised training fundamentally.
Specifically statistical mechanics forms the study of macroscopic equilibrium properties of large
system of elements starting from the motion of atoms and electrons. The enormous degree of
freedom as necessitated by statistical mechanics foundation makes the use of probabilistic
methods to be the most suitable candidate for modelling features that compose the training data
sets [9].
8. 84 Computer Science & Information Technology (CS & IT)
6.1. CDBN based Architecture
The networks trained with statistical mechanics fundamentals model the underlying training
dataset utilizing Boltzmann distribution. To obviate the painfully slow training time as required
to train the Boltzmann machines, multiple variants of the same have been proposed where the
Restricted Boltzmann Machine [RBM] is the one that has provided the best possible modelling
capabilities in minimal time. The resulting stacks of RBM layers are greedily trained layer by
layer [4] resulting in the Deep Belief Networks [DBN] that successfully provides the solution to
image [1- 4], speech recognition [8] and document retrieval problem domains.
DBN can be described as multilayer generative models that learn hierarchy of non-linear feature
detectors. The lower layer learns lower level features which feeds into the higher level and help
them learn complex features. The resulting network maximizes the probability that the training
data is generated by the network. But DBN has its own limitations when scaling to realistic image
sizes [4]. First difficulty is to be computationally tractable with increasing image sizes. The
second difficulty is faced with lack of translational invariance when modelling images.
To scale DBN for modelling realistic size images the powerful concept of Convolutional DBN
[CDBN] had been introduced. CDBN learns feature detectors that are translation invariant i.e. the
feature detectors can detect the features that can be located at any location in an image.
We perform the block Gibbs sampling using conditional distribution as suggested in [4] to learn
the convolutional weights connecting the visible and hidden layers where v and h are activations
of neurons in visible & hidden layers respectively. Also bj are hidden unit biases and ci are visible
unit biases. W forms the weight matrix connecting the visible and hidden layer. The Gibbs
sampling is conducted utilizing (1) & (2).
The weights such learnt give us the layers for Convolutional RBMs [CRBM]. The CRBMs can
be stacked on top of each other to form CDBN. We had probabilistic Max Pooling layers after
convolutional layers [4]. MLP is introduced at the top to complete the architecture. This concept
is captured in Figure 3.
We train the first two layers in the leaf architecture with unsupervised learning. Later we abandon
unsupervised learning and use the learnt weights in the first two layers to initialize the weights in
CNN architecture. The CNN architecture weights are then fine-tuned using backpropagation
method. The architecture used for training with unsupervised learning mechanism is same as
captured in table 2. Also a two-level hierarchical deep learning architecture can be constructed
entirely with CDBN as depicted in Figure 4.
9. Computer Science & Information Technology (CS & IT) 85
Figure 3. Convolutional DBN constructed by stacking CRBMs which are learnt one by one.
6.2. Training Process
We have used Contrastive Divergence CD-1 mechanism to train the first two layers of the
architecture as specified for unsupervised learning. The updates to the hidden units in the positive
phase of CD-1 step were done with sampling rather than using the real valued probabilities. Also
we had used mini-batch size of 10 when training.
We had monitored the accuracy of the training utilizing –
1. Reconstruction Error: It refers to the squared error between the original data and the
reconstructed data. While it does not guarantee accurate training, but during the course of
training it should generally decrease. Also, any large amount of increase suggests the
training is going wrong.
2. Printing learned Weights: The learned weighs needs to be eventually visualized as
oriented, localized edge filters. Printing weights during training helps identify whether
the weights are “approaching” that filter-like shape.
When the ratio of variance of reconstructed data to variance of input image exceeds 2, we
decrease the learning rate by factor of 0.9 and reset the values of weights and hidden bias updates
to ensure that weights don’t explode. The initial momentum was chosen to be 0.5 which is
increased finally to 0.9. The initial learning rate is chosen to be 0.1.
10. 86 Computer Science & Information Technology (CS & IT)
Figure 4. Proposed 2 Level Hierarchical Deep Learning Architecture constructed entirely utilizing CDBNs
for classification/Recognition of 10K+ objects
7. TWO-LEVEL HIERARCHY CREATION
The 2-level hierarchy design for classification of 10K objects categories requires decision making
on the following parameters –
1. Number of leaf models to be trained and
2. Number of output nodes in each leaf model.
To decide these parameters, we first build a hierarchy tree out of the 10184 synsets (classes) in
ImageNet10K dataset (as described in section 7.1). Then using a set of thumb-rules (described in
section 7.2), we try to split and organize all the classes into 64 leaf models, each holding a
maximum of 256 classes
7.1. Building Hierarchical Tree
Using the WordNet IS A relationship, all the synsets of ImageNet10K dataset are organized into
a hierarchical tree. The WordNet ISA relationship is a file that lists the parent-child relationships
between synsets in ImageNet. For example a line “n02403454 n02403740” in the relationship file
refers to the parent synset to be n02403454 (cow) and child synset as n02403740 (heifer).
However the ISA file can relate a single child to multiple parents, i.e. heifer is also the child of
another category n01321854 (young mammal). As the depth of a synset in ImageNet hierarchy
has no relationship to its semantic label, we focused on building the deepest branch for a synset.
We utilized a simplified method that exploits the relationship between synset ID and depth; the
deeper a category nXXX the larger its number XXX. Hence we used the parent category of heifer
as cow, instead of young mammal.
11. Computer Science & Information Technology (CS & IT) 87
The algorithm Htree as depicted in Figure 5a & 5b is used to generate the hierarchy tree. In this
paper, the results from the ninth (9th) iteration are used as base tree. A sample illustration is
captured in Figure 6.
7.2. Thumb-rules for Building Hierarchical Tree
From the Hierarchy tree, it is evident that the dataset is skewed towards the categories like flora,
animal, fungus, natural objects, instruments etc. that are at levels closer to the tree root i.e. 80%
of the synsets fall under 20% of the branches.
Figure 5a. Pseudo-code for Hierarchy Tree Generation (HTree) Algorithm
Figure 5b. Pseudo-code for Hierarchy Tree Generation (HTree) Algorithm
12. 88 Computer Science & Information Technology (CS & IT)
Figure 6. Hierarchy Tree Generated by HTree algorithm at iterations 8, 9 & 10 for ImageNet10K. The
categories at Depth-9 are part of leaf-1 model.
Taking into account the number of models to be trained and the time & resources required for
fine-tuning each model, the below thumb-rules were decided to finalize the solution parameters:
1. The ideal hierarchy will have 64 leaf models each capable of classifying 256 categories
2. The synsets for root and leaf level models have to be decided such that the hierarchy tree
is as flat as possible
3. The total number of synsets in a leaf model should be less than 256 and
4. If leaf level models have more than 256 sub-categories under it, the remaining
subcategories will be split or merged with another leaf.
7.3. Final Solution Parameters
The final solution parameters are as follows, 47 leaf models and with each leaf model classifying
200 ~ 256 synsets.
8. RESULTS
We formulate the root & leaf models into 2-level hierarchy. In all a root level model & 47 leaf
level models need to be trained. Each leaf level model recognizes categories that range from 45 to
256 in numbers. We use ImageNet10K dataset [7], which is compiled from 10184 synsets of the
Fall-2009 release of ImageNet.
Each leaf node has at least 200 labelled images which amount to 9M images in total. The top 5
error rates for 25 out of 47 leaf level models have been computed. The graph in Fig. 5 plots the
top 5 errors of leaf models vis-à-vis the training epochs. We observe that when the leaf models
are trained with higher number of training epochs the top 5 error decreases. The top 5 error rate
for the complete 10K objects classification can be computed upon training of all the 47 models as
required by the 2-level hierarchical deep learning architecture.
13. Computer Science & Information Technology (CS & IT) 89
Figure 7. Graph captures the decrease in top-5 error rate with increase in increase in number of epochs
when training with supervised training method
Training for classification of 10K objects with the proposed 2-level hierarchical architecture is in
progress and is estimated to be completed by mid of September’15.
In this architecture, the root model & the 46/47 leaf models are based on CNN architecture and
trained with supervised gradient descent.
Utilizing unsupervised learning we have trained a leaf model Leaf-4 that consists of man-made
artifacts with 235 categories. The model for Leaf-4 is CDBN based architecture as described in
Section 6. We have trained the first layer of this CDBN architecture with contrastive divergence
(CD-1) algorithm. Later on the first layer weights are utilized to initialize the weights for leaf-4
model in supervised setting. The same are then fine-tuned with back-propagation utilizing
supervised learning. The feature extractors or kernels learnt with supervised & unsupervised
learning are captured in Fig. 7.
We intend to compute the top 5 error rates for categories using the algorithm as captured in the
Fig. 9. Figures 10 – 13 depict the Top-5 classification results using this 2-level hierarchical deep
learning architecture. From top-left, the first image is the original image used for testing. The
remaining images represent the top-5 categories predicted by the hierarchical model, in
descending order of their confidence.
9. CONCLUSIONS
The top-5 error rate for the entire two-level architecture is required to be computed in conjunction
with the error rates of root & leaf models. The realizations of this two level visual recognition
architecture will greatly simply the object recognition scenario for large scale object recognition
problem. At the same time the proposed two level hierarchical deep learning architecture will
help enhance the accuracy of the complex object recognition scenarios significantly that
otherwise would not be possible with just 1-level architecture.
14. 90 Computer Science & Information Technology (CS & IT)
Figure 8. : The left image depicts the first layer 64 numbers of 7*7 filter weights learned by leaf
CD-1. The right image depicts the same first layer weights after fine
The trade of with the proposed 2
model. The total size of the 2-level recognition models including the root & leaf models amounts
to approximately 5 GB. This size might put constraints towards
on low-end devices. The same size is not a constraint when executed with high end device or
cloud based recognition where RAM size is higher. Besides we can always split the 2
hierarchical model between device & clou
the novel device-cloud collaboration architectures.
The proposed 10K architecture will soon be available for classifying large scale objects. This
breakthrough will help enable applications as divers
targeted advertisements, augmented reality, retail, robotic navigation, video surveillance, search
& information retrieval from multimedia content etc. The hierarchical recognition models can be
deployed and commercialized in various devices like Smart phones, TV, Home Gateways and
VR head set for various B2B and B2C use cases.
Figure 9. Algorithm to compute top 5 error rates for 10K categories as evaluated by 2
Computer Science & Information Technology (CS & IT)
the first layer 64 numbers of 7*7 filter weights learned by leaf
1. The right image depicts the same first layer weights after fine-tuning with back propagation.
The trade of with the proposed 2-level architecture is the size of the hierarchical recognition
level recognition models including the root & leaf models amounts
to approximately 5 GB. This size might put constraints towards executing the entire architecture
end devices. The same size is not a constraint when executed with high end device or
cloud based recognition where RAM size is higher. Besides we can always split the 2
hierarchical model between device & cloud which paves the way for object recognition utilizing
cloud collaboration architectures.
The proposed 10K architecture will soon be available for classifying large scale objects. This
breakthrough will help enable applications as diverse as drone vision, industrial robotic vision
augmented reality, retail, robotic navigation, video surveillance, search
& information retrieval from multimedia content etc. The hierarchical recognition models can be
mmercialized in various devices like Smart phones, TV, Home Gateways and
VR head set for various B2B and B2C use cases.
Figure 9. Algorithm to compute top 5 error rates for 10K categories as evaluated by 2-level hierarchical
deep learning algorithm
the first layer 64 numbers of 7*7 filter weights learned by leaf-4 utilizing
tuning with back propagation.
level architecture is the size of the hierarchical recognition
level recognition models including the root & leaf models amounts
executing the entire architecture
end devices. The same size is not a constraint when executed with high end device or
cloud based recognition where RAM size is higher. Besides we can always split the 2-level
d which paves the way for object recognition utilizing
The proposed 10K architecture will soon be available for classifying large scale objects. This
industrial robotic vision,
augmented reality, retail, robotic navigation, video surveillance, search
& information retrieval from multimedia content etc. The hierarchical recognition models can be
mmercialized in various devices like Smart phones, TV, Home Gateways and
level hierarchical
15. Computer Science & Information Technology (CS & IT) 91
Figure 10. The test image belongs to Wilson's warbler. The predicted categories in order of confidence are
a) Yellow Warbler b) Wilsons Warbler c) Yellow Hammer d) Wren, Warbler e) Audubon's Warbler
Figure 11. The test image belongs to fruit Kumquat. The predicted categories in order of confidence are
a) Apricot b) Loquat c) Fuyu Persimmon d) Golder Delicious e) Kumquat
Figure 12. The test image belongs to Racing Boat. The top-5 predicted categories are a) Racing Boat
b) Racing shell c) Outrigger Canoe d) Gig e) Rowing boat
16. 92 Computer Science & Information Technology (CS & IT)
Figure 13. The test image belongs to category Oak. The top-5 predicted categories are a) Live Oak b)
Shade Tree c) Camphor d) Spanish Oak
ACKNOWLEDGEMENTS
We take this opportunity to express gratitude and deep regards to our mentor Dr. Shankar M
Venkatesan for his guidance and constant encouragement. Without his support it would not have
been possible to materialize this paper.
REFERENCES
[1] Yann LeCun, Koray Kavukvuoglu and Clément Farabet, “Convolutional Networks and Applications
in Vision,” in Proc. International Symposium on Circuits and Systems (ISCAS'10), IEEE, 2010
[2] A. Krizhevsky Ilya Sutskever G. Hinton. "ImageNet classification with deep convolutional neural
networks", in NIPS, 2012
[3] K. Simonyan, A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image
Recognition”, in arXiv technical report, 2014
[4] H Lee, R Grosse, R Ranganath, AY Ng, "Convolutional deep belief networks for scalable
unsupervised learning of hierarchical representations" in Proceedings of the 26th International
Conference on Machine Learning, Montreal, Canada, 2009
[5] Jia Deng, Alexander C. Berg, Kai Li, and Li Fei-Fei, “What Does Classifying More Than 10,000
Image Categories Tell Us?” in Computer Vision–ECCV 2010
[6] Sergey Ioffe, Christian Szegedy, "Batch Normalization - Accelerating Deep Network Training by
Reducing Internal Covariate Shift", in arXiv:1502.03167v3
[7] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei., “ImageNet: A Large-Scale Hierarchical
Image Database”, in CVPR09, 2009.
[8] George E. Dahl, Dong Yu, Li Deng, Alex Acero, “Context-Dependent Pre-Trained Deep Neural
Networks for Large-Vocabulary Speech Recognition” in IEEE Transactions on Audio, Speech &
Language Processing 20(1): 30-42 (2012)
[9] S. Haykin, "Neural Networks: A Comprehensive Foundation", 3rd ed., New Jersey, Prentice Hall,
1998.
[10] G. Orban, "Higher Order Visual Processing in Macaque Extrastriate Cortex" ,Physiological Reviews,
Vol. 88, No. 1, pp. 59-89, Jan. 2008, DOI: 10.1152/physrev.00008.2007.
[11] CCV-A Modern Computer Vision Library, "libccv.org", [Online] 2015, http://libccv.org/
[12] Matthew D. Zeiler and Rob Fergus,”Stochastic Pooling for Regularization of Deep Convolutional
Neural Networks”, in ICLR, 2013.
17. Computer Science & Information Technology (CS & IT)
[13] C. R. Dance, J. Willamowski, L. Fan, C. Bray and G. Csurka, "Visual categorization with bags of
keypoints", in ECCV International Workshop on Statistical Learning in Computer Vision, 2004.
[14] S. Lazebnik, C. Schmid and J. Ponce, "Beyond bags of features: Spatial pyramid matching for
recognizing natural scene categories", in IEEE Computer Society Conf. Computer Vision and Pattern
Recognition, 2006.
AUTHORS
Atul Laxman Katole
Atul Laxman Katole has completed his M.E in Jan 2003 in Signal Processing from Indian
Institute of Science, Bangalore and is currently working at Samsung R&D Institute India
Bangalore. His technical areas of interest include Artificial Intelligence, Deep Learning,
Object Recognition, Speech Recognition, Application Development and Algorithms. He
has seeded & established teams in the domain of Deep Learning with applications to
Image & Speech Recognition.
Krishna Prasad Yellapragada
Krishna Prasad Yellapragada is currently
Institute India-Bangalore. His interests include Deep Learning, Machine Learning &
Content Centric Networks.
Amish Kumar Bedi
Amish Kumar Bedi has completed his B.Tech in Computer Science from IIT Roorkee,
2014 Batch. He is working in Samsung R&D Institute India
His technical areas of interest include Deep Learning/Machine Learning and Artificial
Intelligence.
Sehaj Singh Kalra
Sehaj Sing Kalra has completed his B.Tech in Com
2014. He is working in Samsung R&D Institute India
interest lies in machine learning and its applications in various domains, specifically
speech and image.
Mynepalli Siva Chaitanya
Mynepalli Siva Chaitanya has completed his B.Tech in Electrical Engineering from IIT
Bombay, batch of 2014. He is working in Samsung R&D Institute India
since June 2014. His area of interest includes Neural Networks and Artificial
Intelligence.
Computer Science & Information Technology (CS & IT)
C. R. Dance, J. Willamowski, L. Fan, C. Bray and G. Csurka, "Visual categorization with bags of
keypoints", in ECCV International Workshop on Statistical Learning in Computer Vision, 2004.
ebnik, C. Schmid and J. Ponce, "Beyond bags of features: Spatial pyramid matching for
recognizing natural scene categories", in IEEE Computer Society Conf. Computer Vision and Pattern
completed his M.E in Jan 2003 in Signal Processing from Indian
Institute of Science, Bangalore and is currently working at Samsung R&D Institute India-
Bangalore. His technical areas of interest include Artificial Intelligence, Deep Learning,
ition, Speech Recognition, Application Development and Algorithms. He
has seeded & established teams in the domain of Deep Learning with applications to
Krishna Prasad Yellapragada is currently working as a Technical Lead at Samsung R&D
Bangalore. His interests include Deep Learning, Machine Learning &
Amish Kumar Bedi has completed his B.Tech in Computer Science from IIT Roorkee,
2014 Batch. He is working in Samsung R&D Institute India- Bangalore' since July 2014.
His technical areas of interest include Deep Learning/Machine Learning and Artificial
Sehaj Sing Kalra has completed his B.Tech in Computer Science from IIT Delhi, batch of
2014. He is working in Samsung R&D Institute India - Bangalore since June 2014. His
interest lies in machine learning and its applications in various domains, specifically
Mynepalli Siva Chaitanya has completed his B.Tech in Electrical Engineering from IIT
Bombay, batch of 2014. He is working in Samsung R&D Institute India - Bangalore
since June 2014. His area of interest includes Neural Networks and Artificial
93
C. R. Dance, J. Willamowski, L. Fan, C. Bray and G. Csurka, "Visual categorization with bags of
keypoints", in ECCV International Workshop on Statistical Learning in Computer Vision, 2004.
ebnik, C. Schmid and J. Ponce, "Beyond bags of features: Spatial pyramid matching for
recognizing natural scene categories", in IEEE Computer Society Conf. Computer Vision and Pattern