The document presents a novel technique for localizing caption text in video frames based on detecting inconsistencies in noise levels. Text is artificially added during video editing, which can introduce a different noise level than the original video. The technique estimates noise variance across blocks of the wavelet-transformed image to detect regions with different noise levels, indicating overlaid text. Edge detection is also used to filter out non-text regions. Experimental results show improved localization of overlaid text compared to existing methods.
A Survey On Thresholding Operators of Text Extraction In VideosCSCJournals
ideo indexing is an important problem that has interested by the communities of visual information in image processing. The detection and extraction of scene and caption text from unconstrained, general purpose video is an important research problem in the context of content-based retrieval and summarization. In this paper, the technique presented is for detection text from frames video. Finding the textual contents in images is a challenging and promising research area in information technology. Consequently, text detection and recognition in multimedia had become one of the most important fields in computer vision due to its valuable uses in a variety of recent technical applications. The work in this paper consists using morphological operations for extract text appearing in the video frames. The proposed scheme well as preprocessing to differentiate among where it as the high similarity between text and background information. Experimental results show that the resultant image is the image with only text. The evaluated criteria are applied with the image result and one obtained bay different operator.
EMPIRICAL STUDY OF ALGORITHMS AND TECHNIQUES IN VIDEO STEGANOGRAPHYJournal For Research
Steganography is the art and science of hiding the actual important information under graphics, text, cover file etc. These techniques may be applied without fear of image destruction because they are more integrated into the image. Information can be in the form of text, audio, video. The purpose of steganography is to covert communication and to hide a message from a third party or intruder. Steganography is often confused with cryptography because the two are similar in the way that both are used to protect confidential information. Though there are many types of steganography, video Steganography is more reliable due to high capacity image, more data embedment, perceptual redundancy etc. This research paper deals with various Video Steganography techniques and algorithms including Spatial Domain, Pseudorandom permutations, TPVD (Tri-way pixel value differencing), Motion Vector Technique, Video Compression, and Motion Vector Technique. The Video compression which uses modern coding techniques to reduce redundancy in video data has been also studied and analyzed. In fact, Video compression operates on square-shaped groups or blocks of neighboring pixels, often called macro blocks. These pixel groups or blocks of pixels are compared from one frame to the next and the video compression code sends only the differences within those blocks. Generally, the motion field in video compression is assumed to be translational with horizontal component and vertical component and denoted in vector form for the spatial variables in the underlying image, such as three steps search, etc. The study also discusses and focusses on the evolution of the Video Steganography techniques and algorithms over the years based on its application and subsequent merits and demerits. Further, Advanced Video Steganography Algorithm/Bit Exchange Method based on the bit shifting and XOR operation in the secret message file has been studied and implemented. The encrypted secret message is embed in the cover file in alternate byte. The bits are substituted in LSB & LSB+3 bits in the cover file. Finally, the simulation and evaluation of the above mentioned approach is performed using MATLAB tools.
Inverted File Based Search Technique for Video Copy Retrievalijcsa
A video copy detection system is a content-based search engine focusing on Spatio-temporal features. It
aims to find whether a query video segment is a copy of video from the video database or not based on the
signature of the video. It is hard to find whether a video is a copied video or a similar video since the
features of the content are very similar from one video to the other. The main focus is to detect that the
query video is present in the video database with robustness depending on the content of video and also by
fast search of fingerprints. The Fingerprint Extraction Algorithm and Fast Search Algorithm are adopted
to achieve robust, fast, efficient and accurate video copy detection. As a first step, the Fingerprint
Extraction algorithm is employed which extracts a fingerprint through the features from the image content
of video. The images are represented as Temporally Informative Representative Images (TIRI). Then the
next step is to find the presence of copy of a query video in a video database, in which a close match of its
fingerprint in the corresponding fingerprint database is searched using inverted-file-based method.
TEXT DETECTION AND EXTRACTION FROM VIDEOS USING ANN BASED NETWORKijscai
With fast intensification of existing multimedia documents and mounting demand for information indexing and retrieval, much endeavor has been done on extracting the text from images and videos. The prime intention of the projected system is to spot and haul out the scene text from video. Extracting the scene text from video is demanding due to complex background, varying font size, different style, lower resolution and blurring, position, viewing angle and so on. In this paper we put forward a hybrid method where the two most well-liked text extraction techniques i.e. region based method and connected component (CC) based method comes together. Initially the video is split into frames and key frames obtained. Text region indicator (TRI) is being developed to compute the text prevailing confidence and
candidate region by performing binarization. Artificial Neural network (ANN) is used as the classifier and Optical Character Recognition (OCR) is used for character verification. Text is grouped by constructing the minimum spanning tree with the use of bounding box distance.
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...IJCSEIT Journal
A video fingerprint is a recognizer that is derived from a piece of video content. The video fingerprinting
methods obtain unique features of a video that differentiates one video clip from another. It aims to identify
whether a query video segment is a copy of video from the video database or not based on the signature of
the video. It is difficult to find whether a video is a copied video or a similar video, since the features of the
content are very similar from one video to the other. The main focus of this paper is to detect that the query
video is present in the video database with robustness depending on the content of video and also by fast
search of fingerprints. The Fingerprint Extraction Algorithm and Fast Search Algorithms are adopted in
this paper to achieve robust, fast, efficient and accurate video copy detection. As a first step, the
Fingerprint Extraction algorithm is employed which extracts a fingerprint through the features from the
image content of video. The images are represented as Temporally Informative Representative Images
(TIRI). Then, the second step is to find the presence of copy of a query video in a video database, in which
a close match of its fingerprint in the corresponding fingerprint database is searched using inverted-filebased
method. The proposed system is tested against various attacks like noise, brightness, contrast,
rotation and frame drop. Thus the performance of the proposed system on an average shows high true
positive rate of 98% and low false positive rate of 1.3% for different attacks.
A Survey On Thresholding Operators of Text Extraction In VideosCSCJournals
ideo indexing is an important problem that has interested by the communities of visual information in image processing. The detection and extraction of scene and caption text from unconstrained, general purpose video is an important research problem in the context of content-based retrieval and summarization. In this paper, the technique presented is for detection text from frames video. Finding the textual contents in images is a challenging and promising research area in information technology. Consequently, text detection and recognition in multimedia had become one of the most important fields in computer vision due to its valuable uses in a variety of recent technical applications. The work in this paper consists using morphological operations for extract text appearing in the video frames. The proposed scheme well as preprocessing to differentiate among where it as the high similarity between text and background information. Experimental results show that the resultant image is the image with only text. The evaluated criteria are applied with the image result and one obtained bay different operator.
EMPIRICAL STUDY OF ALGORITHMS AND TECHNIQUES IN VIDEO STEGANOGRAPHYJournal For Research
Steganography is the art and science of hiding the actual important information under graphics, text, cover file etc. These techniques may be applied without fear of image destruction because they are more integrated into the image. Information can be in the form of text, audio, video. The purpose of steganography is to covert communication and to hide a message from a third party or intruder. Steganography is often confused with cryptography because the two are similar in the way that both are used to protect confidential information. Though there are many types of steganography, video Steganography is more reliable due to high capacity image, more data embedment, perceptual redundancy etc. This research paper deals with various Video Steganography techniques and algorithms including Spatial Domain, Pseudorandom permutations, TPVD (Tri-way pixel value differencing), Motion Vector Technique, Video Compression, and Motion Vector Technique. The Video compression which uses modern coding techniques to reduce redundancy in video data has been also studied and analyzed. In fact, Video compression operates on square-shaped groups or blocks of neighboring pixels, often called macro blocks. These pixel groups or blocks of pixels are compared from one frame to the next and the video compression code sends only the differences within those blocks. Generally, the motion field in video compression is assumed to be translational with horizontal component and vertical component and denoted in vector form for the spatial variables in the underlying image, such as three steps search, etc. The study also discusses and focusses on the evolution of the Video Steganography techniques and algorithms over the years based on its application and subsequent merits and demerits. Further, Advanced Video Steganography Algorithm/Bit Exchange Method based on the bit shifting and XOR operation in the secret message file has been studied and implemented. The encrypted secret message is embed in the cover file in alternate byte. The bits are substituted in LSB & LSB+3 bits in the cover file. Finally, the simulation and evaluation of the above mentioned approach is performed using MATLAB tools.
Inverted File Based Search Technique for Video Copy Retrievalijcsa
A video copy detection system is a content-based search engine focusing on Spatio-temporal features. It
aims to find whether a query video segment is a copy of video from the video database or not based on the
signature of the video. It is hard to find whether a video is a copied video or a similar video since the
features of the content are very similar from one video to the other. The main focus is to detect that the
query video is present in the video database with robustness depending on the content of video and also by
fast search of fingerprints. The Fingerprint Extraction Algorithm and Fast Search Algorithm are adopted
to achieve robust, fast, efficient and accurate video copy detection. As a first step, the Fingerprint
Extraction algorithm is employed which extracts a fingerprint through the features from the image content
of video. The images are represented as Temporally Informative Representative Images (TIRI). Then the
next step is to find the presence of copy of a query video in a video database, in which a close match of its
fingerprint in the corresponding fingerprint database is searched using inverted-file-based method.
TEXT DETECTION AND EXTRACTION FROM VIDEOS USING ANN BASED NETWORKijscai
With fast intensification of existing multimedia documents and mounting demand for information indexing and retrieval, much endeavor has been done on extracting the text from images and videos. The prime intention of the projected system is to spot and haul out the scene text from video. Extracting the scene text from video is demanding due to complex background, varying font size, different style, lower resolution and blurring, position, viewing angle and so on. In this paper we put forward a hybrid method where the two most well-liked text extraction techniques i.e. region based method and connected component (CC) based method comes together. Initially the video is split into frames and key frames obtained. Text region indicator (TRI) is being developed to compute the text prevailing confidence and
candidate region by performing binarization. Artificial Neural network (ANN) is used as the classifier and Optical Character Recognition (OCR) is used for character verification. Text is grouped by constructing the minimum spanning tree with the use of bounding box distance.
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...IJCSEIT Journal
A video fingerprint is a recognizer that is derived from a piece of video content. The video fingerprinting
methods obtain unique features of a video that differentiates one video clip from another. It aims to identify
whether a query video segment is a copy of video from the video database or not based on the signature of
the video. It is difficult to find whether a video is a copied video or a similar video, since the features of the
content are very similar from one video to the other. The main focus of this paper is to detect that the query
video is present in the video database with robustness depending on the content of video and also by fast
search of fingerprints. The Fingerprint Extraction Algorithm and Fast Search Algorithms are adopted in
this paper to achieve robust, fast, efficient and accurate video copy detection. As a first step, the
Fingerprint Extraction algorithm is employed which extracts a fingerprint through the features from the
image content of video. The images are represented as Temporally Informative Representative Images
(TIRI). Then, the second step is to find the presence of copy of a query video in a video database, in which
a close match of its fingerprint in the corresponding fingerprint database is searched using inverted-filebased
method. The proposed system is tested against various attacks like noise, brightness, contrast,
rotation and frame drop. Thus the performance of the proposed system on an average shows high true
positive rate of 98% and low false positive rate of 1.3% for different attacks.
Audio Steganography Coding Using the Discreet Wavelet TransformsCSCJournals
The performance of audio steganography compression system using discreet wavelet transform (DWT) is investigated. Audio steganography coding is the technology of transforming stego-speech into efficiently encoded version that can be decoded in the receiver side to produce a close representation of the initial signal (non compressed). Experimental results prove the efficiency of the used compression technique since the compressed stego-speech are perceptually intelligible and indistinguishable from the equivalent initial signal, while being able to recover the initial stego-speech with slight degradation in the quality .
Comparative Study of Spatial Domain Image Steganography TechniquesEswar Publications
Steganography is an important area of research in information security. It is the technique of disclosing information into the cover image via. text, video, and image without causing statistically significant modification to the cover image. Secure communication of data through internet has become a main issue due to several passive and active attacks. The purpose of stegnography is to hide the existence of the message so that it becomes difficult for attacker to detect it. Different steganography techniques are implemented to hide the information effectively also researchers contributed various algorithms in each technique to improve the technique’s efficiency. In this paper we do a brief analysis of different spatial domain image stegnography techniques and their comparison. The modern secure image steganography presents a challenging task of transferring the embedded information to the destination without being detected.
Texture features based text extraction from images using DWT and K-means clus...Divya Gera
Text extraction from different kind of images document, caption and scene text images. Discret wavelet transform was used to exract horizontal, vertical and diagonal features and k-means clustering was used to cluster the features into text and background cluster. For simple images k = 2 worked i.e. text and backgroud cluster while for complex images k=3 was used i.e. text cluster, complex background ad simple background.
DATA HIDING BY IMAGE STEGANOGRAPHY APPLING DNA SEQUENCE ARITHMETIC & LSB INSE...Journal For Research
By Image Steganography we can hide the secret data in cover manner. Where present of secret information can’t realize or visible by malicious users. In this approach Steganography procedure divided into two steps. In first step, DNA sequence (combination of four nucleotides A, C, G & T) used to convert secret information into a key matrix by generating key. In second step, values of key matrix will steganography by Least Significant Bit (LSB) Insertion procedure. Advantage of this procedure is that secret information secured by secret key of DNA sequence and Steganography procedure.
Detecting Original Image Using Histogram, DFT and SVMIDES Editor
Information hiding for covert communication is rapidly
gaining momentum. With sophisticated techniques being
developed in steganography, steganalysis needs to be
universal. In this paper we propose Universal Steganalysis
using Histogram, Discrete Fourier Transform and SVM
(SHDFT). The stego image has irregular statistical
characteristics as compare to cover image. Using Histogram
and DFT, the statistical features are generated to train One-
Class SVM to discriminate the cover and stego image.
SHDFT algorithm is found to be efficient and fast since the
number of statistical features is less compared to the
existing algorithm.
Video Compression Algorithm Based on Frame Difference Approaches ijsc
The huge usage of digital multimedia via communications, wireless communications, Internet, Intranet and cellular mobile leads to incurable growth of data flow through these Media. The researchers go deep in developing efficient techniques in these fields such as compression of data, image and video. Recently, video compression techniques and their applications in many areas (educational, agriculture, medical …) cause this field to be one of the most interested fields. Wavelet transform is an efficient method that can be used to perform an efficient compression technique. This work deals with the developing of an efficient video compression approach based on frames difference approaches that concentrated on the calculation of frame near distance (difference between frames). The
selection of the meaningful frame depends on many factors such as compression performance, frame details, frame size and near distance between frames. Three different approaches are applied for removing the lowest frame difference. In this paper, many videos are tested to insure the efficiency of this technique, in addition a good performance results has been obtained.
Audio Signal Identification and Search Approach for Minimizing the Search Tim...aciijournal
Audio or music fingerprints can be utilize to implement an economical music identification system on a
million-song library, however the system needs great deal of memory to carry the fingerprints and indexes.
Therefore, for a large-scale audio library, memory
imposes a restriction on the speed of music
identifications. So, we propose an efficient music
identification system which used a kind of space-saving
audio fingerprints. For saving space, original finger representations are sub-sample and only one quarters
of the original data is reserved. In this approach,
memory demand is far reduced and therefore the search
speed is criticalincreasing whereas the lustiness and dependability are well preserved. Mapping
audio information to time and frequency domain for
the classification, retrieval or identification tasks
presents four principal challenges. The dimension o
f the input should be considerably reduced;
the ensuing options should be strong to possible distortions of the input; the feature should be informative
for the task at hand simple. We propose distortion
free system which fulfils all four of these requirements.
Extensive study has been done to compare our system
with the already existing ones, and the results sh
ow
that our system requires less memory, provides fast
results and achieves comparable accuracy for a large-
scale database.
Erca energy efficient routing and reclusteringaciijournal
The pervasive application of wireless sensor networks (WNSs) is challenged by the scarce energy constraints of sensor nodes. En-route filtering schemes, especially commutative cipher based en-route filtering (CCEF) can saves energy with better filtering capacity. However, this approach suffer from fixed paths and inefficient underlying routing designed for ad-hoc networks. Moreover, with decrease in remaining sensor nodes, the probability of network partition increases. In this paper, we propose energy-efficient routing and re-clustering algorithm (ERCA) to address these limitations. In proposed scheme with reduction in the number of sensor nodes to certain thresh-hold the cluster size and transmission range dynamically maintain cluster node-density. Performance results show that our approach demonstrate filtering-power, better energy-efficiency, and an average gain over 285% in network lifetime.
SURVEY SURFER: A WEB BASED DATA GATHERING AND ANALYSIS APPLICATIONaciijournal
The most important need for a web based survey technology is speedy performance and accurate results
.Though there are variety of ways a survey can be taken manually and have it assessed manually but here
we are introducing the survey site concept where the assessment of the survey responses are created
automatically by applying certain statistics. In this paper we propose the idea of automated approach to
the analysis of the survey where the results would be evident graphically to take a wise decision. Additional
to graphical view we aim on forming a platform to view complex, tedious data in simple, understandable,
interactive and visualized form.
Audio Steganography Coding Using the Discreet Wavelet TransformsCSCJournals
The performance of audio steganography compression system using discreet wavelet transform (DWT) is investigated. Audio steganography coding is the technology of transforming stego-speech into efficiently encoded version that can be decoded in the receiver side to produce a close representation of the initial signal (non compressed). Experimental results prove the efficiency of the used compression technique since the compressed stego-speech are perceptually intelligible and indistinguishable from the equivalent initial signal, while being able to recover the initial stego-speech with slight degradation in the quality .
Comparative Study of Spatial Domain Image Steganography TechniquesEswar Publications
Steganography is an important area of research in information security. It is the technique of disclosing information into the cover image via. text, video, and image without causing statistically significant modification to the cover image. Secure communication of data through internet has become a main issue due to several passive and active attacks. The purpose of stegnography is to hide the existence of the message so that it becomes difficult for attacker to detect it. Different steganography techniques are implemented to hide the information effectively also researchers contributed various algorithms in each technique to improve the technique’s efficiency. In this paper we do a brief analysis of different spatial domain image stegnography techniques and their comparison. The modern secure image steganography presents a challenging task of transferring the embedded information to the destination without being detected.
Texture features based text extraction from images using DWT and K-means clus...Divya Gera
Text extraction from different kind of images document, caption and scene text images. Discret wavelet transform was used to exract horizontal, vertical and diagonal features and k-means clustering was used to cluster the features into text and background cluster. For simple images k = 2 worked i.e. text and backgroud cluster while for complex images k=3 was used i.e. text cluster, complex background ad simple background.
DATA HIDING BY IMAGE STEGANOGRAPHY APPLING DNA SEQUENCE ARITHMETIC & LSB INSE...Journal For Research
By Image Steganography we can hide the secret data in cover manner. Where present of secret information can’t realize or visible by malicious users. In this approach Steganography procedure divided into two steps. In first step, DNA sequence (combination of four nucleotides A, C, G & T) used to convert secret information into a key matrix by generating key. In second step, values of key matrix will steganography by Least Significant Bit (LSB) Insertion procedure. Advantage of this procedure is that secret information secured by secret key of DNA sequence and Steganography procedure.
Detecting Original Image Using Histogram, DFT and SVMIDES Editor
Information hiding for covert communication is rapidly
gaining momentum. With sophisticated techniques being
developed in steganography, steganalysis needs to be
universal. In this paper we propose Universal Steganalysis
using Histogram, Discrete Fourier Transform and SVM
(SHDFT). The stego image has irregular statistical
characteristics as compare to cover image. Using Histogram
and DFT, the statistical features are generated to train One-
Class SVM to discriminate the cover and stego image.
SHDFT algorithm is found to be efficient and fast since the
number of statistical features is less compared to the
existing algorithm.
Video Compression Algorithm Based on Frame Difference Approaches ijsc
The huge usage of digital multimedia via communications, wireless communications, Internet, Intranet and cellular mobile leads to incurable growth of data flow through these Media. The researchers go deep in developing efficient techniques in these fields such as compression of data, image and video. Recently, video compression techniques and their applications in many areas (educational, agriculture, medical …) cause this field to be one of the most interested fields. Wavelet transform is an efficient method that can be used to perform an efficient compression technique. This work deals with the developing of an efficient video compression approach based on frames difference approaches that concentrated on the calculation of frame near distance (difference between frames). The
selection of the meaningful frame depends on many factors such as compression performance, frame details, frame size and near distance between frames. Three different approaches are applied for removing the lowest frame difference. In this paper, many videos are tested to insure the efficiency of this technique, in addition a good performance results has been obtained.
Audio Signal Identification and Search Approach for Minimizing the Search Tim...aciijournal
Audio or music fingerprints can be utilize to implement an economical music identification system on a
million-song library, however the system needs great deal of memory to carry the fingerprints and indexes.
Therefore, for a large-scale audio library, memory
imposes a restriction on the speed of music
identifications. So, we propose an efficient music
identification system which used a kind of space-saving
audio fingerprints. For saving space, original finger representations are sub-sample and only one quarters
of the original data is reserved. In this approach,
memory demand is far reduced and therefore the search
speed is criticalincreasing whereas the lustiness and dependability are well preserved. Mapping
audio information to time and frequency domain for
the classification, retrieval or identification tasks
presents four principal challenges. The dimension o
f the input should be considerably reduced;
the ensuing options should be strong to possible distortions of the input; the feature should be informative
for the task at hand simple. We propose distortion
free system which fulfils all four of these requirements.
Extensive study has been done to compare our system
with the already existing ones, and the results sh
ow
that our system requires less memory, provides fast
results and achieves comparable accuracy for a large-
scale database.
Erca energy efficient routing and reclusteringaciijournal
The pervasive application of wireless sensor networks (WNSs) is challenged by the scarce energy constraints of sensor nodes. En-route filtering schemes, especially commutative cipher based en-route filtering (CCEF) can saves energy with better filtering capacity. However, this approach suffer from fixed paths and inefficient underlying routing designed for ad-hoc networks. Moreover, with decrease in remaining sensor nodes, the probability of network partition increases. In this paper, we propose energy-efficient routing and re-clustering algorithm (ERCA) to address these limitations. In proposed scheme with reduction in the number of sensor nodes to certain thresh-hold the cluster size and transmission range dynamically maintain cluster node-density. Performance results show that our approach demonstrate filtering-power, better energy-efficiency, and an average gain over 285% in network lifetime.
SURVEY SURFER: A WEB BASED DATA GATHERING AND ANALYSIS APPLICATIONaciijournal
The most important need for a web based survey technology is speedy performance and accurate results
.Though there are variety of ways a survey can be taken manually and have it assessed manually but here
we are introducing the survey site concept where the assessment of the survey responses are created
automatically by applying certain statistics. In this paper we propose the idea of automated approach to
the analysis of the survey where the results would be evident graphically to take a wise decision. Additional
to graphical view we aim on forming a platform to view complex, tedious data in simple, understandable,
interactive and visualized form.
Learning strategy with groups on page based students' profilesaciijournal
Most of students desire to know about their knowledge level to perfect their exams. In learning environment the fields of study overwhelm on page with collaboration or cooperation. Students can do their exercises either individually or collaboratively with their peers. The system provides the guidelines for students' learning system about interest fields as Java in this system. Especially the system feedbacks information about exam to know their grades without teachers. The participants who answered the exam can discuss with each others because of sharing e mail and list of them.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as
automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic
clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization
stratagem. This evolutionary technique always aims
to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to
detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal
values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance
of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
TRUST METRICS IN RECOMMENDER SYSTEMS: A SURVEYaciijournal
Information overload is a new challenge in e-commerce sites. The problem refers to the fast growing of
information that lead following the information flow in real world be impossible. Recommender systems, as
the most successful application of information filtering, help users to find items of their interest from huge
datasets. Collaborative filtering, as the most successful technique for recommendation, utilises social
behaviours of users to detect their interests. Traditional challenges of Collaborative filtering, such as cold
start, sparcity problem, accuracy and malicious attacks, derived researchers to use new metadata to
improve accuracy of recommenders and solve the traditional problems. Trust based recommender systems
focus on trustworthy value on relation among users to make more reliable and accurate recommends. In
this paper our focus is on trust based approach and discuss about the process of making recommendation
in these method. Furthermore, we review different proposed trust metrics, as the most important step in this
process.
Web spam classification using supervised artificial neural network algorithmsaciijournal
Due to the rapid growth in technology employed by the spammers, there is a need of classifiers that are more efficient, generic and highly adaptive. Neural Network based technologies have high ability of adaption as well as generalization. As per our knowledge, very little work has been done in this field using neural network. We present this paper to fill this gap. This paper evaluates performance of three supervised learning algorithms of artificial neural network by creating classifiers for the complex problem of latest web spam pattern classification. These algorithms are Conjugate Gradient algorithm, Resilient Backpropagation learning, and Levenberg-Marquardt algorithm.
An Intelligent Method for Accelerating the Convergence of Different Versions ...aciijournal
In a wide range of applications, solving the linear
system of equations Ax = b is appeared. One of the
best
methods to solve the large sparse asymmetric linear
systems is the simplified generalized minimal resi
dual
(SGMRES(m)) method. Also, some improved versions of
SGMRES(m) exist: SGMRES-E(m, k) and
SGMRES-DR(m, k). In this paper, an intelligent heur
istic method for accelerating the convergence of th
ree
methods SGMRES(m), SGMRES-E(m, k), and SGMRES-DR(m,
k) is proposed. The numerical results
obtained from implementation of the proposed approa
ch on several University of Florida standard
matrixes confirm the efficiency of the proposed met
hod.
Check out the exclusive range of wall wallpaper we have to offer. Browse through our wide range of wallpapers and pick according to your requirements. Contact us now!
Text mining is the process of extracting interesting and non-trivial knowledge or information from unstructured text data. Text mining is the multidisciplinary field which draws on data mining, machine learning, information retrieval, computational linguistics and statistics. Important text mining processes are information extraction, information retrieval, natural language processing, text classification, content analysis and text clustering. All these processes are required to complete the preprocessing step before doing their intended task. Pre-processing significantly reduces the size of the input text documents and the actions involved in this step are sentence boundary determination, natural language specific stop-word elimination, tokenization and stemming. Among this, the most essential and important action is the tokenization. Tokenization helps to divide the textual information into individual words. For performing tokenization process, there are many open source tools are available. The main objective of this work is to analyze the performance of the seven open source tokenization tools. For this comparative analysis, we have taken Nlpdotnet Tokenizer, Mila Tokenizer, NLTK Word Tokenize, TextBlob Word Tokenize, MBSP Word Tokenize, Pattern Word Tokenize and Word Tokenization with Python NLTK. Based on the results, we observed that the Nlpdotnet Tokenizer tool performance is better than other tools.
Recommender systems have grown to be a critical research subject after the emergence of the first paper on collaborative filtering in the Nineties. Despite the fact that educational studies on recommender systems, has extended extensively over the last 10 years, there are deficiencies in the complete literature evaluation and classification of that research. Because of this, we reviewed articles on recommender structures, and then classified those based on sentiment analysis. The articles are categorized into three techniques of recommender system, i.e.; collaborative filtering (CF), content based and context based. We have tried to find out the research papers related to sentimental analysis based recommender system. To classify research done by authors in this field, we have shown different approaches of recommender system based on sentimental analysis with the help of tables. Our studies give statistics, approximately trends in recommender structures research, and gives practitioners and researchers with perception and destiny route on the recommender system using sentimental analysis. We hope that this paper enables all and sundry who is interested in recommender systems research with insight for destiny.
Blood groups matcher frame work based on three level rules in myanmaraciijournal
Today Blood donation is a global interest for world to be survival lives when people are in trouble because of natural disaster. The system provides the ability how to decide to donate the blood according to the rules for blood donation not to meet the physicians. In this system, there are three main parts to accept blood from donors when they want to donate according to the features like personal health. The application facilitates to negotiate between blood donors and patients who need to get blood seriously on page without going to Blood Banks and waiting time in queue there.
HYBRID DATA CLUSTERING APPROACH USING K-MEANS AND FLOWER POLLINATION ALGORITHMaciijournal
Data clustering is a technique for clustering set of objects into known number of groups. Several approaches are widely applied to data clustering so that objects within the clusters are similar and objects in different clusters are far away from each other. K-Means, is one of the familiar center based clustering algorithms since implementation is very easy and fast convergence. However, K-Means algorithm suffers from initialization, hence trapped in local optima. Flower Pollination Algorithm (FPA) is the global optimization technique, which avoids trapping in local optimum solution. In this paper, a novel hybrid data clustering approach using Flower Pollination Algorithm and K-Means (FPAKM) is proposed. The proposed algorithm results are compared with K-Means and FPA on eight datasets. From the experimental results, FPAKM is better than FPA and K-Means.
Prediction of lung cancer is most challenging problem due to structure of cancer cell, where most of the cells are overlapped each other. The image processing techniques are mostly used for prediction of lung cancer and also for early detection and treatment to prevent the lung cancer. To predict the lung cancer various features are extracted from the images therefore, pattern recognition based approaches are useful to predict the lung cancer. Here, a comprehensive review for the prediction of lung cancer by previous researcher using image processing techniques is presented. The summary for the prediction of lung cancer by previous researcher using image processing techniques is also presented.
A Survey On Thresholding Operators of Text Extraction In VideosCSCJournals
Video indexing is an important problem that has interested by the communities of visual information in image processing. The detection and extraction of scene and caption text from unconstrained, general purpose video is an important research problem in the context of content-based retrieval and summarization. In this paper, the technique presented is for detection text from frames video. Finding the textual contents in images is a challenging and promising research area in information technology. Consequently, text detection and recognition in multimedia had become one of the most important fields in computer vision due to its valuable uses in a variety of recent technical applications. The work in this paper consists using morphological operations for extract text appearing in the video frames. The proposed scheme well as preprocessing to differentiate among where it as the high similarity between text and background information. Experimental results show that the resultant image is the image with only text. The evaluated criteria are applied with the image result and one obtained bay different operator.
An Exploration based on Multifarious Video Copy Detection Strategiesidescitation
We co-exist in an era, where tonnes and tonnes of
videos are uploaded every day. Video copy detection has become
the need for the hour as most of them are user generated
Internet videos through popular sites such as YouTube. It acts
as a medium to restrain piracy and prove whether the contents
are legitimate. The usual procedure adopted in video copy
detection techniques include discovering whether a query
video is copied from a database of videos or not. This paper
acquaints different Video copy detection techniques that have
been adopted to ensure robust and secure videos along some
applications of video fingerprinting.
The text data present in images and video contain certain useful information for automatic annotation, indexing, and structuring of images. However variations of the text due to differences in text style, font, size, orientation, alignment as well as low image contrast and complex background make the problem of automatic text extraction extremely difficult and challenging job. A large number of techniques have been proposed to address this problem and the purpose of this paper is to design algorithms for each phase of extracting text from a video using java libraries and classes. Here first we frame the input video into stream of images using the Java Media Framework (JMF) with the input being a real time or a video from the database. Then we apply pre processing algorithms to convert the image to gray scale and remove the disturbances like superimposed lines over the text, discontinuity removal, and dot removal. Then we continue with the algorithms for localization, segmentation and recognition for which we use the neural network pattern matching technique. The performance of our approach is demonstrated by presenting experimental results for a set of static images.
Content Based Video Retrieval Using Integrated Feature Extraction and Persona...IJERD Editor
Traditional video retrieval methods fail to meet technical challenges due to large and rapid growth of
multimedia data, demanding effective retrieval systems. In the last decade Content Based Video Retrieval
(CBVR) has become more and more popular. The amount of lecture video data on the Worldwide Web (WWW)
is growing rapidly. Therefore, a more efficient method for video retrieval in WWW or within large lecture video
archives is urgently needed. This paper presents an implementation of automated video indexing and video
search in large videodatabase. First of all, we apply automatic video segmentation and key-frame detection to
extract the frames from video. At next, we extract textual keywords by applying on video i.e. Optical Character
Recognition (OCR) technology on key-frames and Automatic Speech Recognition (ASR) on audio tracks of that
video. At next, we also extractingcolour, texture and edge detector features from different method. At last, we
integrate all the keywords and features which has extracted from above techniques for searching
purpose.Finallysearch similarity measure is applied to retrieve the best matchingcorresponding videos are
presented as output from database. Additionally we are providing Re-ranking of results as per users interest in
original result.
Steganography is the technique of hiding any secret information like password, data andimage behind any cover file. This paper proposes a method which is an audio-video crypto steganography system which is the combination of audio steganography and video steganography using advanced chaotic algorithm as the secure encryption method. The aim is to hide secret information behind image and audio of video file. Since video is the application of many audio and video frames. We can select a particular frame for image hiding and audio for hiding our secret data.4LSB substitution can be used for image steganography and LSB substitution algorithm with location selection for audio steganography. Advanced chaotic algorithm can be used for encryption and decryption of data and images. Suitable parameter of security and authentication such as PSNR value, histograms are obtained at both the receiver side and transmitter sides which are found to be identical at both ends. Reversible data hiding methods for both video and audio are also being mentioned. Hence the security of the data and image can be enhanced. This method can be used in fields such as medical and defence which requires real time processing.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Assure Contact Center Experiences for Your Customers With ThousandEyes
LOCALIZATION OF OVERLAID TEXT BASED ON NOISE INCONSISTENCIES
1. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
1
LOCALIZATION OF OVERLAID TEXT BASED ON
NOISE INCONSISTENCIES
Too Kipyego Boaz and Prabhakar C. J.
Department of Computer Science, Kuvempu University, India
ABSTRACT
In this paper, we present a novel technique for localization of caption text in video frames based on noise
inconsistencies. Text is artificially added to the video after it has been captured and as such does not form
part of the original video graphics. Typically, the amount of noise level is uniform across the entire
captured video frame, thus, artificially embedding or overlaying text on the video introduces yet another
segment of noise level. Therefore detection of various noise levels in the video frame may signify
availability of overlaid text. Hence we exploited this property by detecting regions with various noise levels
to localize overlaid text in video frames. Experimental results obtained shows a great improvement in line
with overlaid text localization, where we have performed metric measure based on Recall, Precision and f-
measure.
KEYWORDS
Overlaid text detection, Text information extraction, Text localization.
1. INTRODUCTION
Given the fast and widespread penetration of multimedia data into all areas of life, nowadays
cheaper and high performance digital media is being relied upon as the primary way to present
news information, sports and entertainment regularly which captures current events as they occur.
Digital videos and images form larger part of archived multimedia data files and they are rich in
text information. Text has a well-defined unambiguous meaning that reflects the contents of the
video and images. Therefore, it is imperative to build algorithms that provide mechanisms to
ensure reliability of multimedia data and enables efficient browsing, querying, retrieval, and
indexing of the desired contents available in the on-line digital libraries worldwide. Extraction of
these texts comes with a number of challenges and they include, low resolutions, colour bleeding,
low contrast, font sizes and orientations [1]. Text Information Extraction (TIE) in videos is a very
important area in research, because text contains high-level semantic information, and therefore
have many useful applications in the following areas; 1) Document analysis and retrieval, 2)
Vehicle’s license plate detection and extraction, 3) Keyword based image search, 4) Identification
of parts in industrial automation, 5) Address block location etc.
Text that resides in the digital videos and images can be classified based on the following two
criteria: 1) Naturally and 2) Overlaid. Text that appears naturally on the captured video without
human input is called scene text, whereas text that is artificially overlaid to an already existing
video or image is known as Caption or Graphics text. Scene text becomes hard to extract as they
form part of the video or image and are characterized by low contrast, reside in complex textured
background, and therefore requires advanced techniques to extract its texture features. Caption
text is relatively easier to extract when compared to scene text, this is because they are overlaid
2. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
2
during video editing, where the most important thing that is considered during video editing is
readability by using high contrast colours between the video background colour and the overlaid
text colour.
In general, the TIE from videos involves three major steps: 1) text localization, 2) text
segmentation, and 3) text recognition. Text localization algorithm is applied to locate the text
regions by drawing a rectangular bounding box around it. Text segmentation or extraction is
performed to compute the foreground pixels from the localized text region, whereas, text
recognition is conducted to convert the segmented text image into plain text.
With the knowledge that caption text is not part of the original captured video, but is overlaid
during the editing process, we assume that this procedure causes the duplication of pixels within
the edited region and brings in some detectable changes and inconsistencies into video properties
such as noise variance, chromatic aberration and statistical changes [2], [3]. In the field of
multimedia forensics these image property inconsistencies are analyzed to determine the
authenticity of the image. In this area there have been two approaches for determining the
authenticity of the image according to [2], a) Source identification and b) forgery detection. In the
case of source identification approach it deals with identifying the source digital device, whereas
forgery detection focuses on discovering evidence of tampering by assessing the authenticity of
the digital media. Further, forgery detection methods can be classified into: 1) active and 2)
passive (Blind) approaches. A digital watermarks method (passive approach) [4] has been
proposed as a means for fragile, content authentication, tampering detection, localization of
changes, and recovery of original content. Whereas the blind approach is regarded as the new
direction with the interest in this field has over the last few years rapidly increasing. In contrast to
active approaches, blind approaches do not need any explicit priori information about the image.
The blind approach always works with the absence of any digital watermark or signature.
Mahdian et al., [3] proposed to employ a blind approach to detect tampered regions where they
have considered noise inconsistencies as key component; they assumed that a given authentic
image has uniform noise levels, while tampered regions where two or more images from a
different source are spliced together, introduce various noise levels into the image. They have
also considered the case where some parts of the image may be copied and pasted into a different
part of the image, with the intention to hide an object or a region of the image, this process causes
duplication of pixels within the pasted region, when locally random noise was added to the image
by the authors they noted a mismatch resulting to noise inconsistencies in the image. Kobayashi
et al., [5] proposed a method to detect superimposition generated from video not contained in the
original sequence. Their method employs identification of noise inconsistencies between the
original video and superimposed regions to detect forgeries.
In this study we propose an approach that performs overlaid text localization on the basis of the
noise characteristics. Pixels within a non-overlapping window of the edited regions are
differentiated from the rest by computing their noise variance and determining their relationship
with its neighbours. A merging technique is employed that groups neighbouring blocks with
similar noise variance together. The detection of various homogeneous noise levels is an
evidential existence and residence of overlaid text in the original digital media. An important
feature of the caption text is its linear property as text always appears in a linear form. Text in a
line is expected to have similar width, height and spacing between the characters and words [6]
and is rich in edge information. Therefore we have incorporated other existing cheaper methods
such as edge detection to help filter out the regions that do not contain caption text.
3. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
3
The rest of this paper proceeds as follows; related work on caption text detection and localization
is given in section 2, with section 3 giving a detailed description of the proposed approach,
section 4 provides the experimental results, while section 5 concludes the paper.
2. RELATED WORK
Caption text contained in videos is a rich source of information, where caption based content
retrieval has become a popular focus for researchers dealing with video content retrieval.
Localization in video frames has also been a hot area in the TIE, where information media comes
into focus, as users want to retrieve and get the information published in the digital archival
system. It has been a challenge to browse and retrieve the intended content as videos are stored in
compressed graphics media form which cannot be retrieved easily as the current tools cannot read
and recognise them efficiently in the said picture format. Prior to localization of text, text
detection is the initial step aimed at finding out if the video frames contains text or not, and has
been considered in many recent studies [7], [8]. The authors employed a scene change algorithm
to select target frames from the video sequence at affixed time interval and a colour histogram is
used to segment into various colour planes where text lines are detected and stored.
After text detection algorithm has confirmed the existence of text within the image frame, it is
followed by text localization which is the most important part of text Information Extraction and
it involves finding the position of text region in the image frame [9], [10], [11], a good
localization result is a set of tight bounding box around each text region [1].
Current video text localization approaches can be classified into two categories based on the
number of frames involved; 1) Single frame and 2) Multiple frames. Multi-frame integration
(MFI) utilizes the temporality of video frame sequences [12][13],[14],[15], where frame contents
that appear to be true in all the selected sequence of frames are considered as text feature, because
the same text existing in the multiple video frames generally tends to occur at the same location
for at least 2-seconds [13]. While these methods can register good results in determining text
regions, its accuracy is very low due to high false positive when the method contains other
characteristics that resemble text features. The other category detects text regions in individual
frames independently [16]. A great deal of work has been done to detect and localize text in video
frames and these methods can be categorized again into two classes namely, 1) Texture-based and
2) Region-based text extraction [17]. This is independent of the first classification that was based
on the number of frames involved.
The texture based methods [18], [19], [20] extracts textural properties by scanning the video
frames at a number of scales and analyze neighbouring pixels for classification based on text
properties which include edge densities, gradient magnitudes and intensity variances, in general,
texture features are used to train classifiers that will eventually distinguish between pixels that
belong to the text and those belonging to non-text. Techniques such as Wavelet transform, Gabor
filters, Fourier transform and machine learning techniques belongs to these category. Texture-
based feature extraction methods [18] and [19] presumes that a video have a discriminate textural
properties, between the text and non-text objects; that is the area presumed to form the image
background and may contain scene text. However, texture based methods are faced with some
limitations such as computational complexity, difficulty in integrating information from different
scales and in ability to sufficiently detect slant text [6].
To separate caption text from scene text, Shivakumara et al., [21], [20] proposed a technique that
separates the two types of text based on the nature of the backgrounds the reside on. It is noted
that caption text are overlaid on a relatively less complex background as the image editors takes
4. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
4
into consideration text readability, unlike the scene text that are always part of the complex scene.
Whereas in [20] they again proposed another method that combines wavelet-laplacian and colour
features to detect text. They perform wavelet decomposition separately on each of the three
colour channels of the RGB and selected the a set of the first three high frequency sub-bands on
each channel the average of their results are used to enhance the text pixels.
The region based methods [22], [23], [24] works through a bottom up approach where they divide
the input video frame into smaller regions. Pixels within each region are processed and analyzed
for certain properties, neighbouring pixels exhibiting similar properties are then grouped together
forming a connected components. Unlike texture based methods, this approach has some
advantages, such as ability to detect text of various scales and multi-oriented text. Individual
regions containing various connected components merging back again to form large
homogeneous regions with favourable and desired properties. Methods that works with connected
components, colour and edge features comprises region based techniques.
In [22] the authors proposed an approach that exploits the approximate constant colour exhibited
by foreground pixels containing certain properties where they are then grouped together to form
regions. Liu et al., [23] have proposed a three step tracking method to locate caption text, perform
region binarization and detect the changes in caption text. The approach implements the spatial
edge information within an individual frame and the binarization technique is used to classify text
and non-text pixels. The result of this initial procedure will be used to detect caption changes
across the successive frames.
A Temporal feature method is proposed in [25], the spatial, temporal locations, caption
segmentation and post processing to identify stroke direction changes and to segment caption
pixels based on consistency and dominancy of caption colour distribution which is later refined
by applying post processing techniques. Akhtar et al., [26] proposed an edge based segmentation
method that finds vertical gradients and its average gradient magnitude in a pixel neighbourhood
for horizontally aligned artificial Urdu text detection from videos, resulting to image binarization
which is smoothed using horizontal run length smoothing algorithm to merge text regions and an
edge filter to remove noisy non-text regions.
In [24] they have proposed an approach for overlaid text localization using Colour Filter Array
(CFA) to detect various regions as they assume that text portion has distinct intensity values with
respect to the non-text region or background. This difference in the intensity values was captured
in the gradient image map. While this method produces good results their main drawbacks are 1)
it requires explicit prior information and 2) It performs poorly on commercially compressed JPEG
images, which forms a large part of online data.
Caption text localization methods provided in this section follows all the steps of TIE in order to
achieve their objectives, the ultimate goal of these methods are to provide the user with the tools
capable to robustly use text image browser for direct access into to the temporal position of an
interesting text-image and correct text-lines in a video document. Though these methods have
achieved high accuracy rates, but they fall short of being reliable to all forms of videos.
3. PROPOSED APPROACH
In our proposed method we have developed a robust method that provides a reliable solution to
localization of overlaid text on the basis of the noise characteristics. The process of video editing
introduces a different noise level to already existing noise properties in the original video. Hence
our approach is presented as a problem of identifying various regions with homogeneous noise
5. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
5
levels through the computation and comparison of the relationship between pixel values in the
image in order to detect various noise inconsistencies.
Various noise levels do not in itself provide a proof of the existence of overlaid text, but we have
incorporated edge map to truly identify text regions. The diagrammatic flow of our approach is
shown figure 1. It is discussed in details through the following steps; 1) Estimation of noise
variance, 2) Block merging, 3) Edge estimation and 4) Morphological operation and Mapping.
Figure 1: Flow-graph of our proposed approach
3.1. Estimation of Noise Variance
Given a noisy video frame or image that is assumed to contain overlaid text, we can formulate the
image signal as shown in the equation below;
( , ) ( , ) ( , ).nf x y f x y n x y (1)
Where ( , )f x y is the uncorrupted signal, while ( , )n x y is a Gaussian white noise.
We have taken the original image and perform a single level pyramid decomposition using the
wavelet transform to obtain the sub-bands, shown in figure 2(b) ( 1 1 1 1L L L H H L and H H ). Wavelets
have been widely used in noise estimation as it cuts up image data into different frequency
components. The 1H H sub-band (figure 2(c)) contains the diagonal details of the image and it
represents image’s highest resolution. For the sake of better visualization 1H H is highlighted as
shown in figure 2(d). Wavelet coefficients at 1H H not only corresponds to noise, but it can be
affected with image structures. If there is a textured object in a scene, it is not possible to obtain
the noise component independently this is because the spatial variation is mixed in the signal.
Therefore, this sub-band is further processed, with the assumption that, the wavelet coefficients
with absolute value smaller than the threshold value which is computed iteratively belongs to
noise [27].
The next step is to apply a non-overlapping tiling window iT of size N N on the refined 1H H sub-
band in order to obtain definite blocks that we assume to have relative amount of noise levels. To
determine N we perform preliminary work to find the width and height of every connected
6. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
6
component through morphological operation, where we estimated the maximum and minimum
width and height of various characters and the minimum of their respective averages are
considered as the value of N as shown in equation (2) and equation (3) below;
Figure 2: (a) original gray scale image, (b) The wavelet decomposition sub-bands, (c) 1H H sub-band and
(d) 1H H sub-band (highlighted).
(m ax ,m in ) (m ax ,m in )( , ) m ax & m in ( , ),
ch a r ch a r
i ih w h w (2)
(m ax,m in) (m ax,m in)m in( ( ), ( )).N avg h w (3)
wherei is the number of characters in the image.
After obtaining definite blocks we employed a gradient based noise variance estimator on
each iT , 1, 2..., ( )
im agesize im agesize
i
N N
, in order to obtain gradient amplitudes. Gradient estimator
is a widely used technique for estimating the standard deviation ˆ of the noise high amplitudes
[28]. The standard deviation can be robustly estimated using the following median measurement.
1(| |)
ˆ
0.6745
m edian H H
(4)
3.2 Merging of Blocks
Once the noise standard deviation of each block is estimated, we are going to have several blocks
having various noise variances figure 3(a), where every block is assigned a median standard
deviation.
We have used this new information as the homogeneity condition to segment the investigated
image into several homogenous sub-regions [3]. The procedure is carried out by applying a
simple regions-merging technique, where generally all the neighbouring blocks that have similar
or almost similar standard deviation are merged together based on the selected similarity
threshold value, as computed below.
Let Th be the set threshold value, then ˆ ˆ| |k m T h
where 1, 2, 3, ..., ( )
im agesize im agesize
k
N N
and 1.m k
The neighbouring blocks can only be merged if their absolute differences in their standard
deviations are within the threshold value, figure 3(b) shows merged blocks, whereas figure 3(c) is
a labelled image assigned with unique labels.
7. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
7
Figure 3: Intermediate results (a) Image blocking with various noise variances, (b) merged blocks and (c)
Blocks assigned with labels.
The threshold value is determined in order to group neighbouring blocks together, where a
stepwise threshold values iT h is set, this should be done basically by initially analyzing the noise
variances in order to determine the number of classes to allocate. The whole procedure is
integrated into simple merging algorithm that is capable of grouping neighbouring blocks of an
investigated image into various partitions with homogenous noise levels. The labelling is given
based on homogeneity of noise levels.
3.3 Localization of Text Region
Following the labelling of homogeneous noise regions, our next step is to identify which among
the labelled regions represents the area containing caption text. We have therefore integrated the
result of block labelling with a low cost edge detection method. It is widely believed that edges
are rich and reliable feature of text regardless of colour intensity or layout [29]. Edge strength
and density are two distinguishing characteristics of text overlaid in images.
We have extracted edges from the gray scale image of the original using canny edge (figure 4
(a)), as it gives strong edges. Based on our earlier computed width and height of the image
characters (equation 2 and 3), the resultant edge map is further processed to eliminate both
vertical and horizontal edges with extremely longer lengths than m axh and m axw this are edges
characterized by long lengths in the form of straight lines that does not belong to text,
To get the text regions we have integrated both the merged labelled blocks (figure 3 (c)) with the
refined edge map (figure 4 (b)). An image analysis is then done to eliminate labelled blocks that
do not have enough edge information through the determination of the ratio between edge
information and the area of the block
8. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
8
Figure 4: Intermediate results (a) Edge map, (b) refined edge map. Text localization results by (c) Bounding
box and (d) extracted text regions.
Normally, text overlaid in an image appears in clusters, therefore to eliminate short and isolated
edges, a morphological operation is then used to connected edges together. The size of the
structuring element is taken as N (equation 3).
We can only retain those regions that contain text by eliminating small connected components
with fewer pixels. This procedure will classify regions into text and non-text labelled blocks. The
final stage for text localization is to label all connected components within text rich block by
drawing bounding box around them (figure 4(c)). This is followed by mapping and extracting
from the original RGB frame only regions within the bounding boxes as shown in figure 4 (d),
which represents our text localization result.
4. EXPERIMENTAL RESULTS
To the best of our knowledge, there is no known benchmark dataset available in the literature for
caption text detection on the basis of the noise characteristics. We have therefore considered and
created two sets of datasets obtained from the following sources; 1) Internet downloads and 2)
public databases. These datasets comprise of both videos and images containing graphics text,
where most of these datasets are compressed. Since our approach requires the information about
noise, the compressed videos and images limit its performance, due to the evolution of
sophisticated codec which reduces noise.
The compressed videos and images poses even more challenges to process than un-compressed,
since the compression codec can have retouch tools in a bid to cover the traces of editing or else
when saved in JPEG, as it cannot produce the exact edit but approximates. The selection of text
frames for the created dataset attempts to label a frames according to the noise levels they
contain, ideally the selection process should be simple, fast and cost effective in order to reduce
the computational time required to process them.
Noise levels don’t give text information directly but only acts as a tool to identify various
homogeneous regions. The overall experimental implementation of our approach using
MATLAB, with a frame size of 512x512 resolutions and block size parameter of N=10, on a PC
with a 2.93 GHz core 2duo processor and 256MB memory, has an average run time of 17
seconds. All experimental results were obtained on gray scale video frames or images, using the
Daubechies wavelet db8., with the additive Gaussian mean of zero.
4.1 Frame Selection
Noise level is a very important parameter to many images processing applications such as image
de-noising, and accurately estimating the noise level improves their performance. The main
importance of this section to this paper is to analyze the noise characteristics in each input image
independently for the purpose of building a database having the desired noise characteristics. This
is done by estimating the noise level in the input video frame based on the method proposed by
9. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
9
1 1.5 2 2.5 3 3.5 4
0
5
10
15
20
25
30
35
40
True Vs Estimated Noise
Number of Noise Level
NoiseLevel
ground truth
Estimated
Liu et al., [30]. The graph (figure 5) shows estimated noise level for a gray scale image. The input
to this process is a single video frame positively identified to contain overlaid text, in which we
are seeking to estimate its noise information. The estimation process does require neither a
reference frame nor prior information. Because our approach uses noise information it is
imperative that the video frames selected should contain some noise. If the frame doesn’t contain
any noise, but contains overlaid text, therefore some local noise is added to the video frame to test
if there will provide the desired characteristics.
Our created dataset must be either in these two conditions,
1) When an video frame or image has noise level above the set threshold, and
2) When the video frame or image does not have noise or have noise level less than the threshold
value, then additive noise is added to the video frame or image.
In this section we have build a pre-processing step guided by our approach requirements and it
provides a cheaper way to create a dataset consisting of video frames and images with the desired
noise characteristics and is accurately selected from different internet sources. We have made this
section not part of our approach computational time.
Figure 5: Noise level estimation for the gray scale image shown in figure 2(a)
4.2 Evaluation Metrics
Evaluating text localization has generated a number of test metrics where each is adapted to
simulated localization results. In our approach we have adopted a widely used and more accurate
metric as it produces a good averaging result on our dataset. The adopted metric test for Recall
(R), Precision (P) and f-measure and uses a principle of supervised evaluation [31]. Two
synthetic image maps are maintained for each evaluation process, (1) Ground truth image map
and (2) Simulated localization result. The ground truth map is generated by manually marking the
bounding box which surrounds the entire text block. The most important factor to consider while
marking the ground truth is to make sure that some specific properties such as foreground pixels
are not left out as they are our main interest. The metric provides a score to measure the
performance of our technique by determining the correspondence between these two image maps.
The computation of precision (equation 5) and recall (equation 6) based on pixel accuracy is
given in the following definitions.
10. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
10
1. A pixel is classified as true positive (TP) if it is ON in both ground truth image map and output
of simulated text localized frame image
2. A pixel is classified as false positive (FP) if it is ON only in the output of simulated text
localized image.
3. A pixel is classified as false negative (FN) if it is ON only in the ground truth image map.
,
( )
N um ber of T P
precision
N um ber of F P N um ber of T P
(5)
R e ( ) .
( )
N um ber of T P
call R
N um ber of F N N um ber of T P
(6)
We use pixel accurate ground truth in total of 339 video frames and images containing caption
text of various languages including a combination of Kannada (regional language) and English
script. The video frames and images contain caption of various text layout and font sizes.
4.2.1. Experiments on own Datasets
We have collected a number of video clips for our dataset from the commercial TV channels. In
order to evaluate the performance of our proposed approach to localize caption text, we
conducted experiments on individual frames extracted on video clips downloaded from TV9
Kannada channel in various situations, such as TV9 (news), TV9 (hosting), TV9 (Celebrity talk),
TV9 (advertisement) and English CNN channel, as these video frames are generally structured
with complex background, low contrast and blurred text. Since our proposed approach does not
require any prior information about noise, it was important that only randomly and non-sequential
video frames are selected with the assumption that they contain overlaid text. All of these video
frames have a resolution of 512×512 and a frame rate 29.97 per second. Sample result from our
dataset is shown in figure 6, where multi-size fonts do exist within the video frame, this condition
posses a great challenge in text localization as each font size will require specific value of
structuring element. In order to address this challenge we have perform experiments repetitively,
where in every trial we have noted the localized results along with its corresponding structuring
element. We will only select the results that portray a fair localization to all font sizes. The
evaluation performance of our approach on our dataset is shown in Table 1.
Figure 6: Own dataset sample results (a) Gray scale video frame, (b) Image blocking map. (c) Text
localization results and (d) Extracted text
Table 1. Results on Recall, Precision and F-measure
Video source
Number of pixels %
True positive False positive False Negative
11. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
11
TV9 (news) 0.89 0.12 0.07
TV9 (hosting) 0.99 0.03 0.00
TV9 (Celebrity talk) 0.99 0.09 0.04
TV9 (advertisement) 0.98 0.04 0.03
Average 0.96 0.07 0.04
4.2.2. Experiments on Benchmark Datasets
To test the performance of our technique, we created a dataset comprising of videos and images
obtained from various publicly available benchmark databases such as Image Processing Centre
(IPC)-Artificial text and ICDAR-13 dataset. These are standard developed dataset containing
labelled ground truth for accurate comparison and evaluation. Figure 7 shows the accurate results
of localization and extraction based on IPC dataset.
The experimental results of the proposed algorithm show an improved performance even in
compressed digital video frames and images and are shown in Table 2.
Figure 7: IPC dataset sample results (a) Gray scale video frame, (b) Image blocking map. Text localization
results by (c) Extracted text regions and (d) Extracted text.
Table 2. Results on Recall, Precision and F-measure
Video source
Number of pixels %
True positive False positive False Negative
ICDAR-2013 [32] 0.97 0.03 0.10
IPC dataset [33] 0.98 0..02 0.04
Still images [34] 0.99 0.01 0.07
Average 0.98 0.02 0.07
4.2.3. Comparison with other Techniques
The performance of our proposed technique has been evaluated and a comparison with other
recent text localization methods has been done. The recent methods which include edge based
method [25] and Feature based method [18] has been implemented and experiments done using
both our dataset and publicly available datasets [33], [34]. From the experimental results
obtained, our method performs better in both our dataset and the benchmark dataset than the
existing methods in areas with complex background and compressed video frames and images.
12. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
12
The visual comparisons can be clearly detected from the output results between the proposed and
existing methods as shown in Figure 6, with the comparison results shown in Table 3. The
proposed result (figure 6 (a)) is visually better than the existing methods even before we
incorporate the edge algorithm. The existing methods [25] and [18] are capable of localizing text
area, but fail to provide a tight bounding box around text area hence they enclose more non-text
pixels than text pixels.
Table 3. Results on Recall, Precision and F-measure
Method Precision Recall F-measure
Edge based [25] 0.90 0.93 0.92
Feature based [18] 0.87 0.85 0.86
Proposed 0.95 0.94 0.95
Figure 6: Comparison results. (a) Candidate text region based on block label (proposed), Final text
localization (proposed), (c) Region based method [25] and (d) Feature based [18].
5. CONCLUSION
We have performed caption text localization based on image noise inconsistencies. Typically,
captured video frames or images contain uniform noise levels. In TIE, it is assumed that there is a
change in noise characteristics when text is overlaid into a video or image during the editing
process. This is because within the overlaid region there will be duplication of pixels, and when
additive Gaussian noise is added to it, the duplicated regions will not match exactly. The accurate
identification and labelling of various regions with homogeneous noise variances provides an
insight of the location of overlaid text. Text localization results of our approach show that it is
possible in a simple and blind way to divide an investigated image into various segments with
homogenous noise level without depending on any prior information, and this is the main strength
of our approach. The primary contribution of this work is the use of noise characteristics to
discover image inconsistencies in edited video frames and images. But the main drawback of the
method is that, some images may also contain various isolated regions with totally different
variances caused by structured image. The method can denote these regions as inconsistent with
the rest of the image, and therefore it is necessary that there shall be a human interpretation of the
output results.
REFERENCES
13. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
13
[1] Crandall, David, Sameer Antani, and Rangachar Kasturi. "Extraction of special effects caption text
events from digital video." International journal on document analysis and recognition 5.2-3 (2003):
138-157.
[2 ]G R Talmale and R W Jasutkar. Article: Analysis of Different Techniques of Image Forgery
Detection. IJCA Proceedings on National Conference on Recent Trends in Computing NCRTC(3):13-
18, May 2012. Full text available.
[3] Mahdian, Babak, and Stanislav Saic. "Using noise inconsistencies for blind image forensics." Image
and Vision Computing 27.10 (2009): 1497-1503.
[4] J. Fridrich, "Methods for Tamper Detection in Digital Images", Proc. ACM Workshop on Multimedia
and Security, Orlando, FL, October 30−31, 1999, pp. 19−23.
[5] Kobayashi, Michihiro, Takahiro Okabe, and Yoichi Sato. "Detecting forgery from static-scene video
based on inconsistency in noise level functions." Information Forensics and Security, IEEE
Transactions on 5.4 (2010): 883-892.
[6] Epshtein, Boris, Eyal Ofek, and Yonatan Wexler. "Detecting text in natural scenes with stroke width
transform." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE,
2010.
[7] Kim, Hae-Kwang. "Efficient automatic text location method and content-based indexing and
structuring of video database." Journal of Visual Communication and Image Representation 7.4
(1996): 336-344.
[8] M.A. Smith, T. Kanade, Video skimming for quick browsing based on audio and image
characterization, Technical Report CMU-CS-95-186, Carnegie Mellon University, July 1995.
[9] X. Chen, A. Yuille, "Detecting and Reading Text in Natural Scenes", Computer Vision and Pattern
Recognition (CVPR), pp. 366-373, 2004
[10] Wang, R., Jin, W., Wu, L., A novel video caption detection approach using multi-frame integration,
Pattern Recognition, 17th International Conference on ICPR2004, Vol. 1, 2004, pp. 449-452
[11] Jung, K., Kim, K.I., Jain, A.K., Text information extraction in images and video: A survey, pattern
Recognition, Vol. 37, No. 5, 2004, pp. 977-997.
[12] X. Hua, P. Yin and H.J. Zhang, “Efficient video text recognition using Multiple Frame Integration”,
IEEE Int. Conf. on Image Processing (ICIP), Sept 2002.
[13] Yi, Jian, Yuxin Peng, and Jianguo Xiao. "Using multiple frame integration for the text recognition of
video." Document Analysis and Recognition, 2009. ICDAR'09. 10th International Conference on.
IEEE, 2009.
[14] Li, Huiping, David Doermann, and Omid Kia. "Automatic text detection and tracking in digital
video." Image Processing, IEEE Transactions on 9.1 (2000): 147-156.
[15] Li, Huiping, and David Doermann. "Text enhancement in digital video using multiple frame
integration." Proceedings of the seventh ACM international conference on Multimedia (Part 1). ACM,
1999.
[16] J. Hu, J. Xi, L. Wu, “Automatic Detection and Verification of Text Regions in News Video Frames”,
Int. Journal of Pattern Recognition and Artificial Intelligence, Vol.16, No.2, pp.257-271, 2002.
[17] Sharma, Nabin, Umapada Pal, and Michael Blumenstein. "Recent advances in video based document
processing: a review." Document Analysis Systems (DAS), 2012 10th IAPR International Workshop
on. IEEE, 2012.
[18] Sun, H., Zhao, N., Xu, X., Extraction of text under complex background using wavelet transform and
support vector machine, IEEE International Conference on Mechatronics and Automation(ICMNA
2006), Vol. 2006, No. 4026310, 2006, pp. 1493-1497.
[19] H-K Kim, "Efficient automatic text location method and content-based indexing and structuring of
video database". J Vis Commun Image Represent 7(4):336–344 (1996).
[20] Shivakumara, Palaiahnakote, Trung Quy Phan, and Chew Lim Tan. "New wavelet and color features
for text detection in video." Pattern Recognition (ICPR), 2010 20th International Conference on.
IEEE, 2010.
[21] P. Shivakumara, N. Kumar, D. Guru, and C. Tan. Separation of graphics (superimposed) and scene
text in video frames. In Document Analysis Systems (DAS), 2014 11th IAPR International Workshop
on, pages 344-348, April 2014.
14. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
14
[22] A. Jain, B. Yu, “Automatic Text Location in Images and Video Frames”, Pattern Recognition 31(12):
2055-2076 (1998).
[23] Liu, Xiaoqian, and Weiqiang Wang. "Extracting captions from videos using temporal feature." In
Proceedings of the international conference on Multimedia, pp. 843-846. ACM, 2010.
[24] Amit Hooda, Madhur Kathuria & Vinod Pankajakshan “Application of Forgery Localization in
Overlay Text Detection” ICVGIP ’14, December 14 - 18 2014, Bangalore, India (to appear).
[25] Jamil, A., Siddiqi, I., Arif, F., & Raza, A. (2011, September). Edge-based Features for Localization of
Artificial Urdu Text in Video Images. In Document Analysis and Recognition (ICDAR), 2011
International Conference on (pp. 1120-1124). IEEE.
[26] Wang, R., Jin, W., Wu, L., A novel video caption detection approach using multi-frame integration,
Pattern Recognition, 17th International Conference on ICPR2004, Vol. 1, 2004, pp. 449-452.
[27] D. Pastor, “A theoretical result for processing signals that have unknown distributions and priors in
white Gaussian noise,” Comput. Stat. Data Anal., vol. 52, no. 6, pp. 3167–3186, 2008.
[28] D. Donoho, I. Johnstone, “Ideal spatial adaption by wavelet” shrinkage,Biometrika 8 (1994) 425–455.
[29] 30_Samarabandu, Jagath, and Xiaoqing Liu. "An edge-based text region extraction algorithm for
indoor mobile robot navigation." International Journal of Signal Processing 3.4 (2006): 273-280.
[30] Liu, Xinhao, Masayuki Tanaka, and Masatoshi Okutomi. "Single-Image Noise Level Estimation for
Blind Denoising." (2013): 1-1.
[31] Hemery, Baptiste, et al. "Evaluation protocol for localization metrics." Image and Signal Processing.
Springer Berlin Heidelberg, 2008. 273-280.
[32] ICDAR-2013dataset: http://dag.cvc.uab.es/icdar2013competition/?com=news&action=data&id=15.
[33] IPC-Artificial text dataset: https://sites.google.com/site/artificialtextdataset/.
[34] Still images: http://www-i6.informatik.rwth-aachen.de/imageclef/resources/iaprtc12.tgz.