This summarizes a document describing a modular approach for retrieving moments from lifelogs. The approach uses both textual analysis and semantic enrichment of queries as well as visual concept augmentation of images. It relies on an inverted index of visual concepts and metadata. The paper describes the main components, including data used, query interpretation methods, ranking approaches, filtering techniques, and clustering algorithms. It then outlines 12 runs using different combinations of these methods, focusing on query interpretation, subset selection, scoring, and refinement through weighting, thresholding, and visual or temporal clustering.
Web Image Retrieval Using Visual Dictionaryijwscjournal
In this research, we have proposed semantic based image retrieval system to retrieve set of relevant images for the given query image from the Web. We have used global color space model and Dense SIFT feature extraction technique to generate visual dictionary using proposed quantization algorithm. The images are transformed into set of features. These features are used as inputs in our proposed Quantization algorithm for generating the code word to form visual dictionary. These codewords are used to represent images semantically to form visual labels using Bag-of-Features (BoF). The Histogram intersection method is used to measure the distance between input image and the set of images in the image database to retrieve similar images. The experimental results are evaluated over a collection of 1000 generic Web images to demonstrate the effectiveness of the proposed system.
An overview of Media Analytics outlining the evolution of image classification and knowledge extraction. The presentation offers an insight into the Big-Data Analytics for Media Management.
Steganography is going to gain its importance due to the exponential growth and secret communication of potential computer users over the internet [5]. It can also be defined as the study of invisible communication that usually deals with the ways of hiding the existence of the communicated message. Generally data embedding is achieved in communication, image, text, voice or multimedia content for copyright, military communication, authentication and many other purposes [2]. In image Steganography, secret communication is achieved to embed a message into cover image (used as the carrier to embed message into) and generate a stego- image (generated image which is carrying a hidden message)[1]. In this paper we have critically analyzed various steganographic techniques and also have covered steganography overview its major types, classification, applications [3]. KEYWORDS: STEGANOGRAPHY, STEGO IMAGE, COVER IMAGE, LSB
Comparative Performance of Image Scrambling in Transform Domain using Sinusoi...CSCJournals
With the rapid development of technology, and the popularization of internet, communication is been greatly promoted. The communication is not limited only to information but also includes multimedia information like digital Images. Therefore, the security of digital images has become a very important and practical issue, and appropriate security technology is used for those digital images containing confidential or private information especially. In this paper a novel approach of Image scrambling has been proposed which includes both spatial as well as Transform domain. Experimental results prove that correlation obtained in scrambled images is much lesser then the one obtained in transformed images.
A systematic image compression in the combination of linear vector quantisati...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...ijma
Steganography is the science of hidden data in the cover image without any updating of the cover image.
The recent research of the steganography is significantly used to hide large amount of information within
an image and/or audio files. This paper proposed a new novel approach for hiding the data of secret image
using Discrete Cosine Transform (DCT) features based on linear Support Vector Machine (SVM)
classifier. The DCT features are used to decrease the image redundant information. Moreover, DCT is
used to embed the secrete message based on the least significant bits of the RGB. Each bit in the cover
image is changed only to the extent that is not seen by the eyes of human. The SVM used as a classifier to
speed up the hiding process via the DCT features. The proposed method is implemented and the results
show significant improvements. In addition, the performance analysis is calculated based on the
parameters MSE, PSNR, NC, processing time, capacity, and robustness.
Web Image Retrieval Using Visual Dictionaryijwscjournal
In this research, we have proposed semantic based image retrieval system to retrieve set of relevant images for the given query image from the Web. We have used global color space model and Dense SIFT feature extraction technique to generate visual dictionary using proposed quantization algorithm. The images are transformed into set of features. These features are used as inputs in our proposed Quantization algorithm for generating the code word to form visual dictionary. These codewords are used to represent images semantically to form visual labels using Bag-of-Features (BoF). The Histogram intersection method is used to measure the distance between input image and the set of images in the image database to retrieve similar images. The experimental results are evaluated over a collection of 1000 generic Web images to demonstrate the effectiveness of the proposed system.
An overview of Media Analytics outlining the evolution of image classification and knowledge extraction. The presentation offers an insight into the Big-Data Analytics for Media Management.
Steganography is going to gain its importance due to the exponential growth and secret communication of potential computer users over the internet [5]. It can also be defined as the study of invisible communication that usually deals with the ways of hiding the existence of the communicated message. Generally data embedding is achieved in communication, image, text, voice or multimedia content for copyright, military communication, authentication and many other purposes [2]. In image Steganography, secret communication is achieved to embed a message into cover image (used as the carrier to embed message into) and generate a stego- image (generated image which is carrying a hidden message)[1]. In this paper we have critically analyzed various steganographic techniques and also have covered steganography overview its major types, classification, applications [3]. KEYWORDS: STEGANOGRAPHY, STEGO IMAGE, COVER IMAGE, LSB
Comparative Performance of Image Scrambling in Transform Domain using Sinusoi...CSCJournals
With the rapid development of technology, and the popularization of internet, communication is been greatly promoted. The communication is not limited only to information but also includes multimedia information like digital Images. Therefore, the security of digital images has become a very important and practical issue, and appropriate security technology is used for those digital images containing confidential or private information especially. In this paper a novel approach of Image scrambling has been proposed which includes both spatial as well as Transform domain. Experimental results prove that correlation obtained in scrambled images is much lesser then the one obtained in transformed images.
A systematic image compression in the combination of linear vector quantisati...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...ijma
Steganography is the science of hidden data in the cover image without any updating of the cover image.
The recent research of the steganography is significantly used to hide large amount of information within
an image and/or audio files. This paper proposed a new novel approach for hiding the data of secret image
using Discrete Cosine Transform (DCT) features based on linear Support Vector Machine (SVM)
classifier. The DCT features are used to decrease the image redundant information. Moreover, DCT is
used to embed the secrete message based on the least significant bits of the RGB. Each bit in the cover
image is changed only to the extent that is not seen by the eyes of human. The SVM used as a classifier to
speed up the hiding process via the DCT features. The proposed method is implemented and the results
show significant improvements. In addition, the performance analysis is calculated based on the
parameters MSE, PSNR, NC, processing time, capacity, and robustness.
Image fusion is a sub field of image processing in which more than one images are fused to create an image where all the objects are in focus. The process of image fusion is performed for multi-sensor and multi-focus images of the same scene. Multi-sensor images of the same scene are captured by different sensors whereas multi-focus images are captured by the same sensor. In multi-focus images, the objects in the scene which are closer to the camera are in focus and the farther objects get blurred. Contrary to it, when the farther objects are focused then closer objects get blurred in the image. To achieve an image where all the objects are in focus, the process of images fusion is performed either in spatial domain or in transformed domain. In recent times, the applications of image processing have grown immensely. Usually due to limited depth of field of optical lenses especially with greater focal length, it becomes impossible to obtain an image where all the objects are in focus. Thus, it plays an important role to perform other tasks of image processing such as image segmentation, edge detection, stereo matching and image enhancement. Hence, a novel feature-level multi-focus image fusion technique has been proposed which fuses multi-focus images. Thus, the results of extensive experimentation performed to highlight the efficiency and utility of the proposed technique is presented. The proposed work further explores comparison between fuzzy based image fusion and neuro fuzzy fusion technique along with quality evaluation indices.
Pso based optimized security scheme for image authentication and tamper proofingcsandit
The hash function offers an authentication and an integrity to digital images. In this paper an
innovative optimized security scheme based on Particle swarm optimization (PSO) for image
authentication and tamper proofing is proposed. This scheme provide solutions to the issues
such as robustness, security and tamper detection with precise localization. The features are
extracted in Daubechies4 wavelet transform domain with help of PSO to generate the image
hash. This scheme is moderately robust against attacks and to detect and locate the tampered
areas in an image. The experimental results are presented to exhibit the effectiveness of the
proposed scheme.
Genetic Algorithm based Mosaic Image Steganography for Enhanced SecurityIDES Editor
The concept of mosaic steganography was proposed
by Lai and Tsai [4] for information hiding and retrieval using
techniques such as histogram value, greedy search algorithm,
and random permutation techniques. In the present paper, a
novel method is attempted in mosaic image steganography
using techniques such as Genetic algorithm, Key based
random permutation .The creation of a predefined database
of target images has been avoided. Instead, the randomly
selected image is used as the target image reduces the enforced
memory load results reduction in the space complexity .GA is
used to generate a mapping sequence for tile image hiding.
This has resulted in better clarity in the retrieved secret image
as well as reduction in computational complexity. The quality
of original cover image remains preserved in spite of the
embedded data image, thereby better security and robustness
is assured. The mosaic image is yielded by dividing the secret
image into fragments and embed these tile fragments into
the target image based on the mapping sequence by GA and
permuted the sequence again by KBRP with a key .The recovery
of the secret image is by using the same key and the mapping
sequence. This is found to be a lossless data hiding method.
A Comparative Study of Content Based Image Retrieval Trends and ApproachesCSCJournals
Content Based Image Retrieval (CBIR) is an important step in addressing image storage and management problems. Latest image technology improvements along with the Internet growth have led to a huge amount of digital multimedia during the recent decades. Various methods, algorithms and systems have been proposed to solve these problems. Such studies revealed the indexing and retrieval concepts, which have further evolved to Content-Based Image Retrieval. CBIR systems often analyze image content via the so-called low-level features for indexing and retrieval, such as color, texture and shape. In order to achieve significantly higher semantic performance, recent systems seek to combine low-level with high-level features that contain perceptual information for human. Purpose of this review is to identify the set of methods that have been used for CBR and also to discuss some of the key contributions in the current decade related to image retrieval and main challenges involved in the adaptation of existing image retrieval techniques to build useful systems that can handle real-world data. By making use of various CBIR approaches accurate, repeatable, quantitative data must be efficiently extracted in order to improve the retrieval accuracy of content-based image retrieval systems. In this paper, various approaches of CBIR and available algorithms are reviewed. Comparative results of various techniques are presented and their advantages, disadvantages and limitations are discussed.
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...CSCJournals
The traditional approach for solving the object recognition problem requires image representations to be first extracted and then fed to a learning model such as an SVM. These representations are handcrafted and heavily engineered by running the object image through a sequence of pipeline steps which requires a good prior knowledge of the problem domain in order to engineer these representations. Moreover, since the classification is done in a separate step, the resultant handcrafted representations are not tuned by the learning model which prevents it from learning complex representations that might would give it more discriminative power. However, in end-to-end deep learning models, image representations along with the classification decision boundary are all learnt directly from the raw data requiring no prior knowledge of the problem domain. These models deeply learn the object image representation hierarchically in multiple layers corresponding to multiple levels of abstraction resulting in representations that are more discriminative and give better results on challenging benchmarks. In contrast to the traditional handcrafted representations, the performance of deep representations improves with the introduction of more data, and more learning layers (more depth) and they perform well on large-scale machine learning problems. The purpose of this study is six fold: (1) review the literature of the pipeline processes used in the previous state-of-the-art codebook model approach for tackling the problem of generic object recognition, (2) Introduce several enhancements in the local feature extraction and normalization steps of the recognition pipeline, (3) compare the enhancements proposed to different encoding methods and contrast them to previous results, (4) experiment with current state-of-the-art deep model architectures used for object recognition, (5) compare between deep representations extracted from the deep learning model and shallow representations handcrafted through the recognition pipeline, and finally, (6) improve the results further by combining multiple different deep learning models into an ensemble and taking the maximum posterior probability.
System analysis and design for multimedia retrieval systemsijma
Due to the extensive use of information technology and the recent developments in multimedia systems, the
amount of multimedia data available to users has increased exponentially. Video is an example of
multimedia data as it contains several kinds of data such as text, image, meta-data, visual and audio.
Content based video retrieval is an approach for facilitating the searching and browsing of large
multimedia collections over WWW. In order to create an effective video retrieval system, visual perception
must be taken into account. We conjectured that a technique which employs multiple features for indexing
and retrieval would be more effective in the discrimination and search tasks of videos. In order to validate
this, content based indexing and retrieval systems were implemented using color histogram, Texture feature
(GLCM), edge density and motion..
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION cscpconf
This paper aims at providing insight on the transferability of deep CNN features to
unsupervised problems. We study the impact of different pretrained CNN feature extractors on
the problem of image set clustering for object classification as well as fine-grained
classification. We propose a rather straightforward pipeline combining deep-feature extraction
using a CNN pretrained on ImageNet and a classic clustering algorithm to classify sets of
images. This approach is compared to state-of-the-art algorithms in image-clustering and
provides better results. These results strengthen the belief that supervised training of deep CNN
on large datasets, with a large variability of classes, extracts better features than most carefully
designed engineering approaches, even for unsupervised tasks. We also validate our approach
on a robotic application, consisting in sorting and storing objects smartly based on clustering
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...Editor IJMTER
Web mining techniques are used to analyze the web page contents and usage details. Human facial
images are shared in the internet and tagged with additional information. Auto face annotation techniques are used
to annotate facial images automatically. Annotations are used in online photo search and management.
Classification techniques are used to assign the facial annotation. Supervised or semi-supervised machine learning
techniques are used to train the classification models. Facial images with labels are used in the training process.
Noisy and incomplete labels are referred as weak labels. Search-based face annotation (SBFA) is assigned by
mining weakly labeled facial images available on the World Wide Web (WWW). Unsupervised label refinement
(ULR) approach is used for refining the labels of web facial images with machine learning techniques. ULR
scheme is used to enhance the label quality using graph-based and low-rank learning approach. The training phase
is designed with facial image collection, facial feature extraction, feature indexing and label refinement learning
steps. Similar face retrieval and voting based face annotation tasks are carried out under the testing phase.
Clustering-Based Approximation (CBA) algorithm is applied to improve the scalability. Bisecting K-means
clustering based algorithm (BCBA) and divisive clustering based algorithm (DCBA) are used to group up the
facial images. Multi step Gradient Algorithm is used for label refinement process. The web face annotation scheme
is enhanced to improve the label quality with low refinement overhead. Noise reduction is method is integrated
with the label refinement process. Duplicate name removal process is integrated with the system. The indexing
scheme is enhanced with weight values for the labels. Social contextual information is used to manage the query
facial image relevancy issues.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal,
The advents in this technological era have resulted into enormous pool of information. This information is
stored at multiple places globally, in multiple formats. This article highlights a methodology for extracting
the video lectures delivered by experts in the domain of Computer Science by using Generalized Gamma
Mixture Model. The feature extraction is based on the DCT transformations. In order to propose the model,
the data set is pooled from the YouTube video lectures in the domain of Computer Science. The outputs
generated are evaluated using Precision and Recall.
Image fusion is a sub field of image processing in which more than one images are fused to create an image where all the objects are in focus. The process of image fusion is performed for multi-sensor and multi-focus images of the same scene. Multi-sensor images of the same scene are captured by different sensors whereas multi-focus images are captured by the same sensor. In multi-focus images, the objects in the scene which are closer to the camera are in focus and the farther objects get blurred. Contrary to it, when the farther objects are focused then closer objects get blurred in the image. To achieve an image where all the objects are in focus, the process of images fusion is performed either in spatial domain or in transformed domain. In recent times, the applications of image processing have grown immensely. Usually due to limited depth of field of optical lenses especially with greater focal length, it becomes impossible to obtain an image where all the objects are in focus. Thus, it plays an important role to perform other tasks of image processing such as image segmentation, edge detection, stereo matching and image enhancement. Hence, a novel feature-level multi-focus image fusion technique has been proposed which fuses multi-focus images. Thus, the results of extensive experimentation performed to highlight the efficiency and utility of the proposed technique is presented. The proposed work further explores comparison between fuzzy based image fusion and neuro fuzzy fusion technique along with quality evaluation indices.
Pso based optimized security scheme for image authentication and tamper proofingcsandit
The hash function offers an authentication and an integrity to digital images. In this paper an
innovative optimized security scheme based on Particle swarm optimization (PSO) for image
authentication and tamper proofing is proposed. This scheme provide solutions to the issues
such as robustness, security and tamper detection with precise localization. The features are
extracted in Daubechies4 wavelet transform domain with help of PSO to generate the image
hash. This scheme is moderately robust against attacks and to detect and locate the tampered
areas in an image. The experimental results are presented to exhibit the effectiveness of the
proposed scheme.
Genetic Algorithm based Mosaic Image Steganography for Enhanced SecurityIDES Editor
The concept of mosaic steganography was proposed
by Lai and Tsai [4] for information hiding and retrieval using
techniques such as histogram value, greedy search algorithm,
and random permutation techniques. In the present paper, a
novel method is attempted in mosaic image steganography
using techniques such as Genetic algorithm, Key based
random permutation .The creation of a predefined database
of target images has been avoided. Instead, the randomly
selected image is used as the target image reduces the enforced
memory load results reduction in the space complexity .GA is
used to generate a mapping sequence for tile image hiding.
This has resulted in better clarity in the retrieved secret image
as well as reduction in computational complexity. The quality
of original cover image remains preserved in spite of the
embedded data image, thereby better security and robustness
is assured. The mosaic image is yielded by dividing the secret
image into fragments and embed these tile fragments into
the target image based on the mapping sequence by GA and
permuted the sequence again by KBRP with a key .The recovery
of the secret image is by using the same key and the mapping
sequence. This is found to be a lossless data hiding method.
A Comparative Study of Content Based Image Retrieval Trends and ApproachesCSCJournals
Content Based Image Retrieval (CBIR) is an important step in addressing image storage and management problems. Latest image technology improvements along with the Internet growth have led to a huge amount of digital multimedia during the recent decades. Various methods, algorithms and systems have been proposed to solve these problems. Such studies revealed the indexing and retrieval concepts, which have further evolved to Content-Based Image Retrieval. CBIR systems often analyze image content via the so-called low-level features for indexing and retrieval, such as color, texture and shape. In order to achieve significantly higher semantic performance, recent systems seek to combine low-level with high-level features that contain perceptual information for human. Purpose of this review is to identify the set of methods that have been used for CBR and also to discuss some of the key contributions in the current decade related to image retrieval and main challenges involved in the adaptation of existing image retrieval techniques to build useful systems that can handle real-world data. By making use of various CBIR approaches accurate, repeatable, quantitative data must be efficiently extracted in order to improve the retrieval accuracy of content-based image retrieval systems. In this paper, various approaches of CBIR and available algorithms are reviewed. Comparative results of various techniques are presented and their advantages, disadvantages and limitations are discussed.
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...CSCJournals
The traditional approach for solving the object recognition problem requires image representations to be first extracted and then fed to a learning model such as an SVM. These representations are handcrafted and heavily engineered by running the object image through a sequence of pipeline steps which requires a good prior knowledge of the problem domain in order to engineer these representations. Moreover, since the classification is done in a separate step, the resultant handcrafted representations are not tuned by the learning model which prevents it from learning complex representations that might would give it more discriminative power. However, in end-to-end deep learning models, image representations along with the classification decision boundary are all learnt directly from the raw data requiring no prior knowledge of the problem domain. These models deeply learn the object image representation hierarchically in multiple layers corresponding to multiple levels of abstraction resulting in representations that are more discriminative and give better results on challenging benchmarks. In contrast to the traditional handcrafted representations, the performance of deep representations improves with the introduction of more data, and more learning layers (more depth) and they perform well on large-scale machine learning problems. The purpose of this study is six fold: (1) review the literature of the pipeline processes used in the previous state-of-the-art codebook model approach for tackling the problem of generic object recognition, (2) Introduce several enhancements in the local feature extraction and normalization steps of the recognition pipeline, (3) compare the enhancements proposed to different encoding methods and contrast them to previous results, (4) experiment with current state-of-the-art deep model architectures used for object recognition, (5) compare between deep representations extracted from the deep learning model and shallow representations handcrafted through the recognition pipeline, and finally, (6) improve the results further by combining multiple different deep learning models into an ensemble and taking the maximum posterior probability.
System analysis and design for multimedia retrieval systemsijma
Due to the extensive use of information technology and the recent developments in multimedia systems, the
amount of multimedia data available to users has increased exponentially. Video is an example of
multimedia data as it contains several kinds of data such as text, image, meta-data, visual and audio.
Content based video retrieval is an approach for facilitating the searching and browsing of large
multimedia collections over WWW. In order to create an effective video retrieval system, visual perception
must be taken into account. We conjectured that a technique which employs multiple features for indexing
and retrieval would be more effective in the discrimination and search tasks of videos. In order to validate
this, content based indexing and retrieval systems were implemented using color histogram, Texture feature
(GLCM), edge density and motion..
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION cscpconf
This paper aims at providing insight on the transferability of deep CNN features to
unsupervised problems. We study the impact of different pretrained CNN feature extractors on
the problem of image set clustering for object classification as well as fine-grained
classification. We propose a rather straightforward pipeline combining deep-feature extraction
using a CNN pretrained on ImageNet and a classic clustering algorithm to classify sets of
images. This approach is compared to state-of-the-art algorithms in image-clustering and
provides better results. These results strengthen the belief that supervised training of deep CNN
on large datasets, with a large variability of classes, extracts better features than most carefully
designed engineering approaches, even for unsupervised tasks. We also validate our approach
on a robotic application, consisting in sorting and storing objects smartly based on clustering
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...Editor IJMTER
Web mining techniques are used to analyze the web page contents and usage details. Human facial
images are shared in the internet and tagged with additional information. Auto face annotation techniques are used
to annotate facial images automatically. Annotations are used in online photo search and management.
Classification techniques are used to assign the facial annotation. Supervised or semi-supervised machine learning
techniques are used to train the classification models. Facial images with labels are used in the training process.
Noisy and incomplete labels are referred as weak labels. Search-based face annotation (SBFA) is assigned by
mining weakly labeled facial images available on the World Wide Web (WWW). Unsupervised label refinement
(ULR) approach is used for refining the labels of web facial images with machine learning techniques. ULR
scheme is used to enhance the label quality using graph-based and low-rank learning approach. The training phase
is designed with facial image collection, facial feature extraction, feature indexing and label refinement learning
steps. Similar face retrieval and voting based face annotation tasks are carried out under the testing phase.
Clustering-Based Approximation (CBA) algorithm is applied to improve the scalability. Bisecting K-means
clustering based algorithm (BCBA) and divisive clustering based algorithm (DCBA) are used to group up the
facial images. Multi step Gradient Algorithm is used for label refinement process. The web face annotation scheme
is enhanced to improve the label quality with low refinement overhead. Noise reduction is method is integrated
with the label refinement process. Duplicate name removal process is integrated with the system. The indexing
scheme is enhanced with weight values for the labels. Social contextual information is used to manage the query
facial image relevancy issues.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal,
The advents in this technological era have resulted into enormous pool of information. This information is
stored at multiple places globally, in multiple formats. This article highlights a methodology for extracting
the video lectures delivered by experts in the domain of Computer Science by using Generalized Gamma
Mixture Model. The feature extraction is based on the DCT transformations. In order to propose the model,
the data set is pooled from the YouTube video lectures in the domain of Computer Science. The outputs
generated are evaluated using Precision and Recall.
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVALIJCSEIT Journal
The proposed approach avoids the semantic gap in image retrieval by combining automatic relevance
feedback and a modified stochastic algorithm. A visual feature database is constructed from the image
database, using combined feature vector. Very few fast-computable features are included in this step. The
user selects the query image, and based on that, the system ranks the whole dataset. The nearest images are
retrieved and the first automatic relevance feedback is generated. The combined similarity of textual and
visual feature space using Latent Semantic Indexing is evaluated and the images are labelled as relevant or
irrelevant. The feedback drives a feature re-weighting process and is routed to the particle swarm
optimizer. Instead of classical swarm update approach, the swarm is split, for each swarm to perform the
search in parallel, thereby increasing the performance of the system. It provides a powerful optimization
tool and an effective space exploration mechanism. The proposed approach aims to achieve the following
goals without any human interaction - to cluster relevant images using meta-heuristics and to dynamically
modify the feature space by feeding automatic relevance feedback.
Video Data Visualization System : Semantic Classification and Personalization ijcga
We present in this paper an intelligent video data visualization tool, based on semantic classification, for
retrieving and exploring a large scale corpus of videos. Our work is based on semantic classification
resulting from semantic analysis of video. The obtained classes will be projected in the visualization space.
The graph is represented by nodes and edges, the nodes are the keyframes of video documents and the
edges are the relation between documents and the classes of documents. Finally, we construct the user’s
profile, based on the interaction with the system, to render the system more adequate to its preferences.
Video Data Visualization System : Semantic Classification and Personalization ijcga
We present in this paper an intelligent video data visualization tool, based on semantic classification, for retrieving and exploring a large scale corpus of videos. Our work is based on semantic classification resulting from semantic analysis of video. The obtained classes will be projected in the visualization space. The graph is represented by nodes and edges, the nodes are the keyframes of video documents and the
edges are the relation between documents and the classes of documents. Finally, we construct the user’s profile, based on the interaction with the system, to render the system more adequate to its preferences.
Learning to Rank Image Tags With Limited Training Examples1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Image Captioning Generator using Deep Machine Learningijtsrd
Technologys scope has evolved into one of the most powerful tools for human development in a variety of fields.AI and machine learning have become one of the most powerful tools for completing tasks quickly and accurately without the need for human intervention. This project demonstrates how deep machine learning can be used to create a caption or a sentence for a given picture. This can be used for visually impaired persons, as well as automobiles for self identification, and for various applications to verify quickly and easily. The Convolutional Neural Network CNN is used to describe the alphabet, and the Long Short Term Memory LSTM is used to organize the right meaningful sentences in this model. The flicker 8k and flicker 30k datasets were used to train this. Sreejith S P | Vijayakumar A "Image Captioning Generator using Deep Machine Learning" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42344.pdf Paper URL: https://www.ijtsrd.comcomputer-science/artificial-intelligence/42344/image-captioning-generator-using-deep-machine-learning/sreejith-s-p
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals
In today's era of digitization and fast internet, many video are uploaded on websites, a mechanism is required to access this video accurately and efficiently. Semantic concept detection achieve this task accurately and is used in many application like multimedia annotation, video summarization, annotation, indexing and retrieval. Video retrieval based on semantic concept is efficient and challenging research area. Semantic concept detection bridges the semantic gap between low level extraction of features from key-frame or shot of video and high level interpretation of the same as semantics. Semantic Concept detection automatically assigns labels to video from predefined vocabulary. This task is considered as supervised machine learning problem. Support vector machine (SVM) emerged as default classifier choice for this task. But recently Deep Convolutional Neural Network (CNN) has shown exceptional performance in this area. CNN requires large dataset for training. In this paper, we present framework for semantic concept detection using hybrid model of SVM and CNN. Global features like color moment, HSV histogram, wavelet transform, grey level co-occurrence matrix and edge orientation histogram are selected as low level features extracted from annotated groundtruth video dataset of TRECVID. In second pipeline, deep features are extracted using pretrained CNN. Dataset is partitioned in three segments to deal with data imbalance issue. Two classifiers are separately trained on all segments and fusion of scores is performed to detect the concepts in test dataset. The system performance is evaluated using Mean Average Precision for multi-label dataset. The performance of the proposed framework using hybrid model of SVM and CNN is comparable to existing approaches.
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...IJDKP
Content based retrieval has an advantage of higher prediction accuracy as compared to tagging based approach. However, the complexity in its representation and classification approach, results in lower processing accuracy and computation overhead. The correlative nature of the feature data are un-explored in the conventional modeling, where all the data features are taken as a set of feature values to give a decision. The recurrent feature class attribute is observed for the feature regrouping in action model prediction. In this paper a co-relative information, bounding grouping approach is suggested for action model prediction in CBMR application. The co-relative recurrent feature mapping results in faster retrieval process as compared to the conventional retrieval system.
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTI...IJDKP
Content based retrieval has an advantage of higher prediction accuracy as compared to tagging based approach. However, the complexity in its representation and classification approach, results in lower processing accuracy and computation overhead. The correlative nature of the feature data are un-explored in the conventional modeling, where all the data features are taken as a set of feature values to give a decision. The recurrent feature class attribute is observed for the feature regrouping in action model prediction. In this paper a co-relative information, bounding grouping approach is suggested for action model prediction in CBMR application. The co-relative recurrent feature mapping results in faster retrieval process as compared to the conventional retrieval system.
Content Based Image Retrieval (CBIR) aims at retrieving the images from the database based on the user query which is visual form rather than the traditional text form. The applications of CBIR extend from surveillance to remote sensing, medical imaging to weather forecasting, and security systems to historical research and so on. Though extensive research is made on content based image retrieval in the spatial domain, we have most images in the internet which is JPEG compressed which pushes the need for image retrieval in the compressed domain itself rather than decoding it to raw format before comparison and retrieval. This research addresses the need to retrieve the images from the database based on the features extracted from the compressed domain along with the application of genetic algorithm in improving the retrieval results. The research focuses on various features and their levels of impact on improving the precision and recall parameters of the CBIR system. Our experimentation results also indicate that the CBIR features in compressed domain along with the genetic algorithm usage improves the results considerably when compared with the literature techniques.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
1. A Multimedia Modular Approach to Lifelog
Moment Retrieval
Maxime Tournadre, Guillaume Dupont, Vincent Pauwels, Bezeid Cheikh
Mohamed Lmami, and Alexandru Lucian Ginsca
Atos BDS, Bezons, France
maxime@tournadre.org, guillaume.dupont.rm@gmail.com,
vincentpauwels8@gmail.com, alexandru-lucian.ginsca@atos.net,
bezeid.cheikhmohamedlmami@atos.net
Abstract. Lifelogging offers a mean for people to record their day to day
life. However, without proper tools for retrieving specific moments, the
large volume of collected data is rendered less useful. We present in this
paper our contribution to the ImageCLEF Lifelog evaluation campaign.
We describe a modular approach that covers both textual analysis and
semantic enrichment of queries and targeted concept augmentation of
visual data. The proposed retrieval system relies on an inverted index
of visual concepts and metadata and accommodates different versions
of clustering and filtering modules. We tested over 10 fully automatic
combinations of modules and a human guided approach. We observed
that with even minimal human intervention, we can obtain a significant
increase of the F1 score, ranking fourth in the competition leader board.
Keywords: Deep Learning · Natural Language Processing · Life logging
· Computer Vision · Multimedia
1 Introduction
Recently, interest in lifelogging appeared with the breakthrough of affordable
wearable cameras and smartphones apps. It consists in logging the daily life of an
individual using various data (i.e. image, biometrics, location etc.). These objects
produce a huge amount of personal data, in various forms, that could be used to
build great applications to improve the users life (help for the elderly or semantic
video search engine). To research this field, a few datasets were constituted to
allow researchers to compare methods through workshops, competitions and
tasks.
Since 2017, IMAGEClef [1] hosts every year the Lifelog Task [2] [3] in order
to compare different approaches on the same environment. This year, the Lifelog
Moment Retrieval Task consists in returning a selection of 10 images for specific
Copyright c 2019 for this paper by its authors. Use permitted under Creative Com-
mons License Attribution 4.0 International (CC BY 4.0). CLEF 2019, 9-12 Septem-
ber 2019, Lugano, Switzerland.
2. topics. IMAGEClef Lifelog provides extracted visual concepts and many meta-
data for each image as GPS coordinates, the UTC time, the number of steps,
the heart rate ect.
New approaches with beyond state-of-the-art performances are expected to
be found during this challenge. This would allow, for instance, to build a powerful
semantic search engine and to retrieve specific moments easier. Besides this, there
are a wide range of possible application domains, which explains the popularity
of related workshops. In this paper we detail multiple automatic approaches
as well as a human guided one which give users possibility to choose between
predefined filters.
2 Related work
The origins of image retrieval can be traced back to 1979 when a conference
on Database Techniques for Pictorial Applications was held in Florence [4]. In
the beginning, images were first annotated with text and then searched using a
text-based approach. Unfortunately, generating automatic descriptive texts for a
wide spectrum of images is not feasible without manual help. Annotating images
manually is obviously very expensive and it thus limited the text-based meth-
ods. In 1992, the National Science Foundation of the United-States organized a
workshop to identify new directions in image database management [5]. It was
then recognized that indexing visual information on their inherent properties
(shape, color etc.) was more efficient and intuitive. Since then, the application
potential of image database management techniques has attracted the attention
of researchers. In the past decade, many Content-based image retrieval have been
developed.
For the Lifelog Moment Retrieval (LMRT) task of the LifeLog challenge, the
two major criteria to achieve are relevance and diversity. To improve the rele-
vance, most of the approaches relied on the extraction of concepts or features
by existing pre-trained models. This is an important time-saver in these chal-
lenges. Some systems [6] chose to augment the data using other models [7] such
as ResNet50 [8] trained on ImageNet for visual concepts, or VGG16 trained on
Place365 to retrieve places.
Other approaches used one binary classifier per topic or classifiers with a
minimum confidence level for each topic. Many teams also used a blur detector
to filter the images before selection. To ensure diversity, a team [9] decided to
cluster the hyper-features (such as oriented gradient or color histograms features
obtained through various models) of the images. Last but not least, all partici-
pants used various NLP methods to interpret the topics. In this field, the most
popular tools were WordNet [10] and Word2Vec [11].
In the multimedia research field, IMAGEClef is not alone. Many other re-
lated workshops and tasks occur on a regular basis such as NTCIR [12] or ACM
Multimedia [13]. Launched in the late 1997, NTCIR hold its conference each
year in Tokyo. It put an accent on east-Asian language such as Japanese, Ko-
rean and Chinese. Like the IMAGEClef competition, a fair number of tasks are
3. proposed, and among them is a LifeLog action retrieval. ACM Multimedia is an
international conference which is held each year in a different country. Different
workshops revolving around images, text, video, music and sensor data are pro-
posed and a special part of the conference is the art program, which explores
the boundaries of computer science and art.
3 Main components of a modular LMRT system
3.1 Data
Among the data delivered by the organizers, we have chosen to keep some of them
and to add others. We kept the qualitative and persistent data and, therefore,
did not take into account the following: lat, long, song, steps, calories, glucose,
heart rate and distance. The metadata UTC-time was especially very useful and
we will detail its contribution in the clustering part.
In the following, we will provide more details on the extraction of visual
concepts. The user’s images were transmitted with visual concepts extracted
from several neural networks :
– The top 10 attributes predicted by using the PlaceCNN [14], trained on SUN
attribute dataset [15].
– The top 5 categories and their scores predicted by using the PlaceCNN,
trained on Place365 dataset [16].
– Class name, bounding box and score on up to the best 25 objects for each
image. They are predicted by using Faster RCNN [17], trained on the COCO
dataset [18].
In order to diversify the learning databases we added the following:
– The top 3 labels with their scores predicted by using VGG19. This architec-
ture was created by VGG (Visual Geometry Group) from the University of
Oxford [19] and trained on ImageNet [20].
– The top 3 labels with their scores predicted by using ResNet50 trained on
ImageNet. We took an implementation of Microsoft which was the winner
of ILSVRC 2015 [8].
– The top 3 labels with their scores predicted by using InceptionV3 imple-
mented by Google [21] also trained on ImageNet.
– The top 10 labels with their scores predicted by using Retinanet [22] (object
detection) implemented by Facebook.
As another approach, we also built one or two SVM (Support Vector Machines)
[23] per topic thanks to its keywords. They have been created based on the FC2’s
layer of VGG16 implemented in Keras. We have two versions, the first one is
fully automatic with an automatic scraping on the web and the second one is
built with guided scraping in order to reduce the noise and to better specify the
target.
4. 3.2 Queries
The very first step of the online system is the interpretation of a topic. How
could we define how an image corresponds to an undefined topic? We tried
several techniques based on the idea to obtain a semantic representation of each
topic. The way the query is interpreted has a direct impact on the ranking, as it
will change the input of the ranking system. We chose to keep a set of keywords
for each topic. In our toolbox, Python-RAKE 1
is a package used to extract basic
keywords from a topic and WordNet allows to generate pre-defined synonyms.
Finally, Word2Vec models provide another way to generate synonyms. It will
be more detailed in the next section. The way we combined these tools will be
detailed in the next section.
3.3 Ranking
For a given query, our system returns a ranking (full or partial) of the images.
”How could we rank them?” is the general problem of this part. We explored
several methods, from a basic counting of labels selected with a match, to the
average of semantic similarities between labels and a topic. Each ranking is spe-
cific to a well-defined method and may be more efficient than another depending
on the next processing (i.e. clustering, SVM weights...)
3.4 Filters
Another way to change the images selection, is to adjust some requirements. For
instance, if we try to retrieve the images corresponding to ”restaurant” in our
index we would not find ’fast food restaurant’ with an exact match criterion. It
may look trivial, but changing the way an image is selected opens a wide range
of possibilities, especially within the NLP field. We can easily recover images
depending on their visual concepts and metadata. However, we may use some
Word2Vec models to apply a similarity threshold or use a Levenshtein distance
threshold and much more. In our guided method, we also used conditional filters.
For instance, the topic ”Driving home” where the user must be driving home
from work, was treated with two conditions: the user must have been at work
in the last hour and he must arrive at his home in the next fifteen minutes.
Combined with a primary ranking, the filters allow to narrow down the selection
to suppress the noise. Both precision and recall were usually improved thanks
to these approaches.
3.5 Clustering
Since the evaluation method is F@10 - harmonic mean of precision and recall - it
is important to augment the diversity of the selected images. That’s the reason
why it matters to focus on clustering in order to improve diversity without
changing relevance. Consequently, we chose to explore two different approaches,
one using temporal similarity and one using visual similarity.
1
Python-RAKE : https://github.com/fabianvf/python-rake
5. 4 Proposed Methods
We realized 12 runs (11 graded) with a combination of different methods for
each run. This section aims at explaining these methods. Every run followed
the global pipeline described below. The process will be detailed further in this
section.
Fig. 1. General pipeline for the treatment of a topic
4.1 Interpretation
Keywords - For each topic, we extract the associated keywords with the
Python-RAKE package. Python-RAKE cuts the sentence into words, filters with
a list of stop-words and returns the remaining keywords.
Synonyms - This method helps to diversify the topic representation by adding
synonyms to the basic extracted keywords. We use WordNet to generate pre-
defined synonyms based on the WordNet hierarchy. Then, we compute the sim-
ilarity between these synonyms and the basic keywords. This similarity score
enables us to select the relevant synonyms. In our runs, the similarity was com-
puted by the GoogleNews-vector-negative300 which is a Word2Vec model.
Another method to compute similarity uses the vector representation of each
words in a Word2Vec model. This similarity is useful to filter irrelevant syn-
onyms. Our final method consists of selecting 5 synonyms from WordNet with
a threshold (0.5) on the Word2Vec model similarity. Furthermore we keep only
synonyms which don’t match partially to keywords.
6. 4.2 Choice of subset
We tested different solutions to choose a subset to work on. We usually used one
of the following approach to reduce the working set, but in a few runs we kept
the whole dataset.
Match - We first create an inverted index. An inverted index in this case will
take a label or a metadata as an input and will return all the images owning
the input. Using an inverted index is time-saver and useful to interact with the
dataset. Then, we recover through the index the images for which at least one
label or metadata is part of the keywords.
Partial match - Similarly to match, we select labels that match partially at
least one of our keywords. Two words are partial match if and only if one is a
part of the other (i.e. “food“ and “fast-food“). The verification was done through
the clause ‘in‘ in Python. This aims to increase the recall but it introduces some
noise as well.
Similarity - Another approach consists in computing the semantic similarity
between each key of the index and keywords. Only images that have a label
similar enough to a keyword are selected. We didn’t use this approach in our
submitted run, however a similarity threshold at 0.6 should be efficient.
4.3 Scoring
Label counting - In this basic method, the score is the number of time the
image was picked in the inverted index detailed earlier in section 4.2.
Topic Similarity - In this alternative method, we compute the similarity score
between each label and keywords. We then pick the maximum similarity for each
keyword as a score. The final score for an image is the mean keywords scores.
SVM - In terms of a visual approach, we used transfer learning to train a dedi-
cated binary SVM for each topic. In the end, this provided us with a probability
score for each image to be part of the topic based on visual features.
The first step to create our SVM was to establish a training set, so we used the
Google Search API to get images from Google Images2
. The negative examples
are shared between the SVMs and these are images of scenes and objects from
daily life. Here are the words we chose to use for the negative images : bag,
car, cat, daily life photography, dog, fish, footballer, horse, newyork, python,
smartphone and sunset . The positives were scrapped by querying the title of
the topic and a combination of keywords and synonyms. After the scrapping, we
filtered manually the positive to avoid the noise.
2
https://developers.google.com/custom-search/v1/overview
7. Every SVM had around 2000 negative and 200 positive examples. Each image
was then processed through VGG16 to extract the visual features, which are the
input vectors of our SVM. As we intend to use the probability given by the SVM
to establish a threshold in a few runs, we tried to use an isotonic calibration.
However, as our training set wasn’t big enough, we chose to keep the output
probability.
SVM performances could be upgraded with a bigger training set, calibration
and data augmentation. Unfortunately, we didn’t have the time and the resources
to deal with at this moment.
4.4 Refinement
Weighting - The refinement by weighting is based on the idea to combine dif-
ferent scoring methods. In fact, we define a new ranking based on the prediction
of multiple approaches.
Thresholding - In a similar way, the thresholding considers a minimum score
in other methods as a prerequisite to be kept in the ranking.
Visual Clustering - To maximize the diversity of returned images, one ap-
proach was to cluster the first 200 images with the highest score on the features
extracted by our neural networks. We extracted these features with the second
output layer of VGG16 as explained earlier. We clustered the images using the
K-means algorithm to retrieve 10 clusters. It greatly improved the diversity of
images (i.e. recall score) but inevitably reduced the precision score as only one of
many correct images is returned sometimes. It did not improve the overall F@10
score. The second approach for clustering that we will present next proved to be
more reliable.
Temporal Clustering - In order to find the maximum number of moments for a
specified topic, the temporal approach seemed to be the most logical. We wanted
to form different clusters spaced at least one hour, and to detect noise. Thus, we
chose to use DBSCAN3
(Density-based spatial clustering of applications with
noise) on the first 250 images with the highest score.
This algorithm, created in 1996 just needs two arguments: a distance and a
minimum number of points to form a cluster. Consequently, we converted the
UTC times into minutes and then applied the algorithm implemented in Scikit-
Learn. The best parameters on average for the topics were one hour for the
maximal distance between two points to be considered as the same moment and
5 images minimum to form a moment.
In addition, we also used a blur filter from the OpenCV library to reject all
blurry images. Once the blur filter and the clustering were completed, we had to
select ten images. Finally, we chose to show the images with the best scores from
3
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html
8. different clusters. However, many other methods may be used to implement the
image selection between clusters.
4.5 Formatting
The last process before sending the run to the evaluation system, is to normalize
our score to obtain a confidence score ([0,1]) and combine the topics results.
4.6 Detailed runs
Automatic runs With these selected runs we wanted to establish a few com-
parisons between our methods. They will be analyzed in section 5. Table 1 shows
the automatic runs.
Table 1. Correspondence between automatic runs and methods used
Run Interpretation Subset Scoring Refinement
1 Keywords Entire dataset Topic Similarity Time Clustering
2 Keywords Entire dataset Topic Similarity Visual Clustering
3 Keywords Entire dataset Topic Similarity —
4 Keywords Partial match Topic Similarity —
5 Keywords + Synonyms Entire dataset Topic Similarity —
6 Keywords + Synonyms Partial match Topic Similarity —
7 Keywords + Synonyms Partial match Topic Similarity Weighting by SVM
8 Keywords + Synonyms Partial match Topic Similarity Tresholding by SVM
9 Keywords + Synonyms Entire dataset SVM —
10 Keywords + Synonyms Entire dataset SVM Visual Clustering
11 Keywords + Synonyms Entire dataset SVM Time Clustering
Interactive runs During our interactive runs, the user had to interpret a topic
through filters. The subset selection is done by these filters. However, the key-
words and synonyms are still extracted as they will be needed to compute the
scoring of a few methods. Table 2 shows the interactive runs.
Table 2. Correspondence between interactive runs and methods used. * The method
is picked from automatic runs and varies for each topic. The choice is done by a human
Run Interpretation Subset Scoring Refinement
12 Automatic* Filters by Human Automatic* Automatic*
9. 5 Results and analysis
Table 3 shows the results obtained by our different runs in the test set. The
scoring is F@10, which is the harmonic mean of two measurements computed
on each topic. The first one is the precision, whether the ten first are linked to
the given topic. The second one is the proportion of different moments found on
the given topic. Thus, good precision is not enough to achieve a high score; both
recall and precision should be high to reach a great score.
From run 1 to 6, “Mixed run“ means that additional concepts were extracted
(visual) and that process were done on the label (textual). The “Mixed“ aspect
of run 7 and 8 is the same, but we add the refinement by the SVM, which rely
on the visual approach. Run 9 to 11 do not use the textual approach during the
process, however they were trained beforehand.
Table 3. Results obtained on test set
Run Category F@10 on test
1 Automatic — Mixed 0.077
2 Automatic — Mixed 0.036
3 Automatic — Mixed 0.036
4 Automatic — Mixed 0.078
5 Automatic — Mixed 0.053
6 Automatic — Mixed 0.083
7 Automatic — Mixed 0.101
8 Automatic — Mixed 0.068
9 Automatic — Visual 0.099
10 Automatic — Visual Not evaluated
11 Automatic — Visual 0.116
12 Human-Guided 0.255
Table 4 presents the comparison that our runs allow us to do. Thus we are
able to define which method worked best in this specific context. An attempt
to generalize does not guarantee identical results. Despite the risk for Time
Clustering to reduce the precision if there is not enough moments found in the
K first images, its usage constantly increased the F@10 score. In a similar way,
weight a ranking with SVM’s prediction shows an increase in F@10. Combining
these two refinement methods may be great.
However, the winner of these runs is clearly the human-guided method. Com-
bining the human understanding and the mixed automatic runs, it reaches 0.255
at F@10 where our automatics runs didn’t surpass 0.116. Then it would be inter-
esting, in future works, to establish a hybrid framework which asks the operator
to represent topics through filters and then make a ranking with the automatic
approaches. It would use SVM to weight the confidence and Time Clustering to
ensure cluster diversity. Figure 2 gives an example of how it works.
10. Table 4. Comparison of the methods used — Bold shows the cases where one method
was significantly better than the other
Runs Compared methods Best result
1 — 2 Time vs Visual clustering Time Clustering
3 — 4 , 5 — 6 Topic Similarity vs Selection Selection
3 — 5 , 4 — 6 Keywords vs Keywords + Synonyms Keywords + Synonyms
6 — 9 Classic vs SVM SVM
7 — 8 Weighted vs Threshold use of SVM result Weighted
9 — 11 SVM vs SVM + Time Clustering SVM + Time Clustering
11 — 12 Automatic vs Human-Guided Human-Guided
Fig. 2. Example of feasible framework for Lifelog Moments Retrieval
6 Conclusion
In this working-note we presented our approach to the LMRT task of the LifeLog
IMAGEClef competition. We chose two frameworks: one fully automatic, and
one with human assistance. We extracted feature vectors, and used meta-data
on location and time for each image. Our approach relies on linking keywords
and their synonyms from topics to our data. We made the human assistance
framework available because some filters work better on some topics than others.
To cluster the remaining images, we found out that the time clustering had better
results than the visual clustering. The sole purpose of clustering was to improve
the diversity of moments retrieved. It seems logical that the more images are
separated in time, the more they can fit in different moment. Moreover, the
DBSCAN algorithm selects automatically the number of clusters and identifies
noise images. Therefore, it is superior to the k-means algorithm used in visual
clustering. DBSCAN did not achieve great results on visual clustering because
the distance between feature vectors is not well defined.
11. The main difficulty regarding the LMRT task was to process a great deal
of multimodal data and find the optimal processing. Fine tuning parameters
and thresholds by hand was often a good method but it limits the scalability
of the system. As each topic requires slightly different approaches, there is still
some work to do to achieve a fully automatic and adaptive system for moment
retrieval such as improving the topics interpretation and the automatic setting
of appropriate filters.
References
1. B. Ionescu, H. M¨uller, R. P´eteri, Y. D. Cid, V. Liauchuk, V. Kovalev, D. Klimuk,
A. Tarasau, A. B. Abacha, S. A. Hasan, V. Datla, J. Liu, D. Demner-Fushman, D.-
T. Dang-Nguyen, L. Piras, M. Riegler, M.-T. Tran, M. Lux, C. Gurrin, O. Pelka,
C. M. Friedrich, A. G. S. de Herrera, N. Garcia, E. Kavallieratou, C. R. del Blanco,
C. C. Rodr´ıguez, N. Vasillopoulos, K. Karampidis, J. Chamberlain, A. Clark, and
A. Campello, “ImageCLEF 2019: Multimedia retrieval in medicine, lifelogging,
security and nature,” in Experimental IR Meets Multilinguality, Multimodality,
and Interaction, vol. CLEF 2019 Working Notes. CEUR Workshop Proceedings
(CEUR- WS.org), ISSN 1613-0073, http://ceur-ws.org/Vol-2380/ of Proceedings of
the 10th International Conference of the CLEF Association (CLEF 2019), (Lugano,
Switzerland), LNCS Lecture Notes in Computer Science, Springer, September 9-12
2019.
2. D.-T. Dang-Nguyen, L. Piras, M. Riegler, M.-T. Tran, L. Zhou, M. Lux, T.-K. Le,
V.-T. Ninh, and C. Gurrin, “Overview of ImageCLEFlifelog 2019: Solve my life puz-
zle and Lifelog Moment Retrieval,” in CLEF2019 Working Notes, CEUR Work-
shop Proceedings, (Lugano, Switzerland), CEUR-WS.org <http://ceur-ws.org>,
September 09-12 2019.
3. D.-T. Dang-Nguyen, L. Piras, M. Riegler, L. Zhou, M. Lux, and C. Gurrin,
“Overview of imagecleflifelog 2018: daily living understanding and lifelog moment
retrieval,” in CLEF2018 Working Notes (CEUR Workshop Proceedings). CEUR-
WS. org¡ http://ceur-ws. org¿, Avignon, France, 2018.
4. A. Blaser, Data base techniques for pictorial applications. Springer-Verlag, 1980.
5. R. Jain, “Nsf workshop on visual information management systems,” ACM Sigmod
Record, vol. 22, no. 3, pp. 57–75, 1993.
6. E. Kavallieratou, C. R. del Blanco, C. Cuevas, and N. Garc´ıa, “Retrieving events
in life logging,”
7. C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A survey on deep trans-
fer learning,” in International Conference on Artificial Neural Networks, pp. 270–
279, Springer, 2018.
8. J. D. Kaiming Hewith Xiangyu Zhang, Shaoqing Ren and Jian,
Resnet winner of the ImageNet Competition 2015 at http://image-
net.org/challenges/talks/ilsvrc2015 deep residual learning kaiminghe.pdf.
9. M. Dogariu and B. Ionescu, “Multimedia lab@ imageclef 2018 lifelog moment re-
trieval task,”
10. P. University, “About wordnet : https://wordnet.princeton.edu,” 2010.
11. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word
representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
12. 12. C. Gurrin, H. Joho, F. Hopfgartner, L. Zhou, and R. Albatal, “Ntcir lifelog: The
first test collection for lifelog research,” in Proceedings of the 39th International
ACM SIGIR conference on Research and Development in Information Retrieval,
pp. 705–708, ACM, 2016.
13. C. Gurrin, X. Giro-i Nieto, P. Radeva, M. Dimiccoli, D.-T. Dang-Nguyen, and
H. Joho, “Lta 2017: The second workshop on lifelogging tools and applications,”
in Proceedings of the 25th ACM international conference on Multimedia, pp. 1967–
1968, ACM, 2017.
14. B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, “Learning deep features
for scene recognition using places database,” in Advances in neural information
processing systems, pp. 487–495, 2014.
15. G. Patterson, C. Xu, H. Su, and J. Hays, “The sun attribute database: Beyond
categories for deeper scene understanding,” International Journal of Computer
Vision, vol. 108, no. 1-2, pp. 59–81, 2014.
16. B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, “Places: A 10 million
image database for scene recognition,” IEEE transactions on pattern analysis and
machine intelligence, vol. 40, no. 6, pp. 1452–1464, 2017.
17. S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object
detection with region proposal networks,” in Advances in neural information pro-
cessing systems, pp. 91–99, 2015.
18. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar,
and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European
conference on computer vision, pp. 740–755, Springer, 2014.
19. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition,” arXiv preprint arXiv:1409.1556, 2014.
20. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-
scale hierarchical image database,” in 2009 IEEE conference on computer vision
and pattern recognition, pp. 248–255, Ieee, 2009.
21. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Van-
houcke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of
the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.
22. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Doll´ar, “Focal loss for dense object
detection,” in Proceedings of the IEEE international conference on computer vision,
pp. 2980–2988, 2017.
23. A. Tzotsos and D. Argialas, “Support vector machine classification for object-based
image analysis,” in Object-Based Image Analysis, pp. 663–677, Springer, 2008.