This document proposes a method for detecting, localizing, and extracting text from videos with complex backgrounds. It involves three main steps:
1. Text detection uses corner metric and Laplacian filtering techniques independently to detect text regions. Corner metric identifies regions with high curvature, while Laplacian filtering highlights intensity discontinuities. The results are combined through multiplication to reduce noise.
2. Text localization then determines the accurate boundaries of detected text strings.
3. Text binarization filters background pixels to extract text pixels for recognition. Thresholding techniques are used to convert localized text regions to binary images.
The method exploits different text properties to detect text using corner metric and Laplacian filtering. Combining the results improves
TEXT DETECTION AND EXTRACTION FROM VIDEOS USING ANN BASED NETWORKijscai
With fast intensification of existing multimedia documents and mounting demand for information indexing and retrieval, much endeavor has been done on extracting the text from images and videos. The prime intention of the projected system is to spot and haul out the scene text from video. Extracting the scene text from video is demanding due to complex background, varying font size, different style, lower resolution and blurring, position, viewing angle and so on. In this paper we put forward a hybrid method where the two most well-liked text extraction techniques i.e. region based method and connected component (CC) based method comes together. Initially the video is split into frames and key frames obtained. Text region indicator (TRI) is being developed to compute the text prevailing confidence and
candidate region by performing binarization. Artificial Neural network (ANN) is used as the classifier and Optical Character Recognition (OCR) is used for character verification. Text is grouped by constructing the minimum spanning tree with the use of bounding box distance.
Inpainting scheme for text in video a surveyeSAT Journals
Abstract
Text data present in video sequences provide useful information of paramount requirement. Although text present in video provide
useful information not all are of them are necessary because it may hide the important portion of the video. So there must way to
erase this type of unwanted text. This can be done in two phase first text components are detected from each frame of the video.
Detected text component are then removed from the video sequences. And restore the occluded part of the video using inpaint
method. This text detection and removal scheme is in two phases. Each phase is broad topic of image processing. Video text
detection and Inpainting are two most important phase in this scheme. Text detection phase consist of text localization, text
segmentation and recognition phase. Inpainting method is used for restoring occluded part produce due to removal of text.
Keywords—Optical Character Recognition(OCR), Stroke width transform, Text Detection,Connected Component
EMPIRICAL STUDY OF ALGORITHMS AND TECHNIQUES IN VIDEO STEGANOGRAPHYJournal For Research
Steganography is the art and science of hiding the actual important information under graphics, text, cover file etc. These techniques may be applied without fear of image destruction because they are more integrated into the image. Information can be in the form of text, audio, video. The purpose of steganography is to covert communication and to hide a message from a third party or intruder. Steganography is often confused with cryptography because the two are similar in the way that both are used to protect confidential information. Though there are many types of steganography, video Steganography is more reliable due to high capacity image, more data embedment, perceptual redundancy etc. This research paper deals with various Video Steganography techniques and algorithms including Spatial Domain, Pseudorandom permutations, TPVD (Tri-way pixel value differencing), Motion Vector Technique, Video Compression, and Motion Vector Technique. The Video compression which uses modern coding techniques to reduce redundancy in video data has been also studied and analyzed. In fact, Video compression operates on square-shaped groups or blocks of neighboring pixels, often called macro blocks. These pixel groups or blocks of pixels are compared from one frame to the next and the video compression code sends only the differences within those blocks. Generally, the motion field in video compression is assumed to be translational with horizontal component and vertical component and denoted in vector form for the spatial variables in the underlying image, such as three steps search, etc. The study also discusses and focusses on the evolution of the Video Steganography techniques and algorithms over the years based on its application and subsequent merits and demerits. Further, Advanced Video Steganography Algorithm/Bit Exchange Method based on the bit shifting and XOR operation in the secret message file has been studied and implemented. The encrypted secret message is embed in the cover file in alternate byte. The bits are substituted in LSB & LSB+3 bits in the cover file. Finally, the simulation and evaluation of the above mentioned approach is performed using MATLAB tools.
TEXT DETECTION AND EXTRACTION FROM VIDEOS USING ANN BASED NETWORKijscai
With fast intensification of existing multimedia documents and mounting demand for information indexing and retrieval, much endeavor has been done on extracting the text from images and videos. The prime intention of the projected system is to spot and haul out the scene text from video. Extracting the scene text from video is demanding due to complex background, varying font size, different style, lower resolution and blurring, position, viewing angle and so on. In this paper we put forward a hybrid method where the two most well-liked text extraction techniques i.e. region based method and connected component (CC) based method comes together. Initially the video is split into frames and key frames obtained. Text region indicator (TRI) is being developed to compute the text prevailing confidence and
candidate region by performing binarization. Artificial Neural network (ANN) is used as the classifier and Optical Character Recognition (OCR) is used for character verification. Text is grouped by constructing the minimum spanning tree with the use of bounding box distance.
Inpainting scheme for text in video a surveyeSAT Journals
Abstract
Text data present in video sequences provide useful information of paramount requirement. Although text present in video provide
useful information not all are of them are necessary because it may hide the important portion of the video. So there must way to
erase this type of unwanted text. This can be done in two phase first text components are detected from each frame of the video.
Detected text component are then removed from the video sequences. And restore the occluded part of the video using inpaint
method. This text detection and removal scheme is in two phases. Each phase is broad topic of image processing. Video text
detection and Inpainting are two most important phase in this scheme. Text detection phase consist of text localization, text
segmentation and recognition phase. Inpainting method is used for restoring occluded part produce due to removal of text.
Keywords—Optical Character Recognition(OCR), Stroke width transform, Text Detection,Connected Component
EMPIRICAL STUDY OF ALGORITHMS AND TECHNIQUES IN VIDEO STEGANOGRAPHYJournal For Research
Steganography is the art and science of hiding the actual important information under graphics, text, cover file etc. These techniques may be applied without fear of image destruction because they are more integrated into the image. Information can be in the form of text, audio, video. The purpose of steganography is to covert communication and to hide a message from a third party or intruder. Steganography is often confused with cryptography because the two are similar in the way that both are used to protect confidential information. Though there are many types of steganography, video Steganography is more reliable due to high capacity image, more data embedment, perceptual redundancy etc. This research paper deals with various Video Steganography techniques and algorithms including Spatial Domain, Pseudorandom permutations, TPVD (Tri-way pixel value differencing), Motion Vector Technique, Video Compression, and Motion Vector Technique. The Video compression which uses modern coding techniques to reduce redundancy in video data has been also studied and analyzed. In fact, Video compression operates on square-shaped groups or blocks of neighboring pixels, often called macro blocks. These pixel groups or blocks of pixels are compared from one frame to the next and the video compression code sends only the differences within those blocks. Generally, the motion field in video compression is assumed to be translational with horizontal component and vertical component and denoted in vector form for the spatial variables in the underlying image, such as three steps search, etc. The study also discusses and focusses on the evolution of the Video Steganography techniques and algorithms over the years based on its application and subsequent merits and demerits. Further, Advanced Video Steganography Algorithm/Bit Exchange Method based on the bit shifting and XOR operation in the secret message file has been studied and implemented. The encrypted secret message is embed in the cover file in alternate byte. The bits are substituted in LSB & LSB+3 bits in the cover file. Finally, the simulation and evaluation of the above mentioned approach is performed using MATLAB tools.
A Survey of different Data Hiding Techniques in Digital Imagesijsrd.com
Steganography is the art and science of invisible communication, which hides the existence of the communicated message into media such as text, audio, image and video without any suspicion. Steganography is different from cryptography and watermarking in its objectives which includes undetectability, robustness (resistance to various image processing methods and compression) and capacity of the hidden data. Image Steganography uses digital image as its cover media. This paper analyzes and discusses various techniques available today for image steganography along with their strengths and weaknesses.
Hiding text in speech signal using K-means, LSB techniques and chaotic maps IJECEIAES
In this paper, a new technique that hides a secret text inside a speech signal without any apparent noise is presented. The technique for encoding the secret text is through first scrambling the text using Chaotic Map, then encoding the scraped text using the Zaslavsky map, and finally hiding the text by breaking the speech signal into blocks and using only half of each block with the LSB, K-means algorithms. The measures (SNR, PSNR, Correlation, SSIM, and MSE) are used on various speech files (“.WAV”), and various secret texts. We observed that the suggested technique offers high security (SNR, PSNR, Correlation, and SSIM) of an encrypted text with low error (MSE). This indicates that the noise level in the speech signal is very low and the speech purity is high, so the suggested method is effective for embedding encrypted text into speech files.
Secured Data Transmission Using Video Steganographic SchemeIJERA Editor
Steganography is the art of hiding information in ways that avert the revealing of hiding messages. Video Steganography is focused on spatial and transform domain. Spatial domain algorithm directly embedded information in the cover image with no visual changes. This kind of algorithms has the advantage in Steganography capacity, but the disadvantage is weak robustness. Transform domain algorithm is embedding the secret information in the transform space. This kind of algorithms has the advantage of good stability, but the disadvantage of small capacity. These kinds of algorithms are vulnerable to steganalysis. This paper proposes a new Compressed Video Steganographic scheme. The data is hidden in the horizontal and the vertical components of the motion vectors. The PSNR value is calculated so that the quality of the video after the data hiding is evaluated.
IDENTIFICATION OF IMAGE SPAM BY USING LOW LEVEL & METADATA FEATURES IJNSA Journal
Spammers are constantly evolving new spam technologies, the latest of which is image spam. Till now research in spam image identification has been addressed by considering properties like colour, size, compressibility, entropy, content etc. However, we feel the methods of identification so evolved have certain limitations due to embedded obfuscation like complex backgrounds, compression artifacts and wide variety of fonts and formats .To overcome these limitations, we have proposed 2 methodologies(however there can be more). Each methodology has 4 stages. Both the methodologies are
almost similar except in the second stage where methodology I extracts low level features while the other extracts metadata features. Also a comparison between both the methodologies is shown. The method works on images with and without noise separately. Colour properties of the images are altered so that
OCR (Optical Character Recognition) can easily read the text embedded in the image. The proposed
methods are tested on a dataset of 1984 spam images and are found to be effective in identifying all types of spam images having (1) only text, (2) only images or (3) both text and images. The encouraging experimental results show that the methodology I achieves an accuracy of 92% while the other achieves an accuracy of 93.3%.
Notes for Advanced Image Processing subject. This subject comes under Computer Science for B.E./B.Tech and M.E./M.Tech. students. Hope this will help you.
Natural language description of images using hybrid recurrent neural networkIJECEIAES
We presented a learning model that generated natural language description of images. The model utilized the connections between natural language and visual data by produced text line based contents from a given image. Our Hybrid Recurrent Neural Network model is based on the intricacies of Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Bi-directional Recurrent Neural Network (BRNN) models. We conducted experiments on three benchmark datasets, e.g., Flickr8K, Flickr30K, and MS COCO. Our hybrid model utilized LSTM model to encode text line or sentences independent of the object location and BRNN for word representation, this reduced the computational complexities without compromising the accuracy of the descriptor. The model produced better accuracy in retrieving natural language based description on the dataset.
Comparative analysis of c99 and topictiling text segmentation algorithmseSAT Journals
Abstract In this paper, the work done includes the extraction of information from image datasets which contain natural text. The difficulty level of segmenting natural text from an image is too high and so precision is the most important factor to be kept in mind. To minimize the error rates, error filtration technique is provided, as filtration is adopted while doing image segmentation basically text segmentation present in images. Furthermore, a comparative analysis of two different text segmentation algorithms namely C99 and TopicTiling on image documents is presented. To assess how well each algorithm works, each was applied on different datasets and results were compared. The work done also proves the efficiency of TopicTiling over C99. Index Terms: Text Segmentation, text extraction, image documents,C99 and TopicTiling.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Text Extraction is a process by which we convert Printed document/Scanned Page or Image in which text are available to ASCII Character that a Computer can Recognize.
A Survey of different Data Hiding Techniques in Digital Imagesijsrd.com
Steganography is the art and science of invisible communication, which hides the existence of the communicated message into media such as text, audio, image and video without any suspicion. Steganography is different from cryptography and watermarking in its objectives which includes undetectability, robustness (resistance to various image processing methods and compression) and capacity of the hidden data. Image Steganography uses digital image as its cover media. This paper analyzes and discusses various techniques available today for image steganography along with their strengths and weaknesses.
Hiding text in speech signal using K-means, LSB techniques and chaotic maps IJECEIAES
In this paper, a new technique that hides a secret text inside a speech signal without any apparent noise is presented. The technique for encoding the secret text is through first scrambling the text using Chaotic Map, then encoding the scraped text using the Zaslavsky map, and finally hiding the text by breaking the speech signal into blocks and using only half of each block with the LSB, K-means algorithms. The measures (SNR, PSNR, Correlation, SSIM, and MSE) are used on various speech files (“.WAV”), and various secret texts. We observed that the suggested technique offers high security (SNR, PSNR, Correlation, and SSIM) of an encrypted text with low error (MSE). This indicates that the noise level in the speech signal is very low and the speech purity is high, so the suggested method is effective for embedding encrypted text into speech files.
Secured Data Transmission Using Video Steganographic SchemeIJERA Editor
Steganography is the art of hiding information in ways that avert the revealing of hiding messages. Video Steganography is focused on spatial and transform domain. Spatial domain algorithm directly embedded information in the cover image with no visual changes. This kind of algorithms has the advantage in Steganography capacity, but the disadvantage is weak robustness. Transform domain algorithm is embedding the secret information in the transform space. This kind of algorithms has the advantage of good stability, but the disadvantage of small capacity. These kinds of algorithms are vulnerable to steganalysis. This paper proposes a new Compressed Video Steganographic scheme. The data is hidden in the horizontal and the vertical components of the motion vectors. The PSNR value is calculated so that the quality of the video after the data hiding is evaluated.
IDENTIFICATION OF IMAGE SPAM BY USING LOW LEVEL & METADATA FEATURES IJNSA Journal
Spammers are constantly evolving new spam technologies, the latest of which is image spam. Till now research in spam image identification has been addressed by considering properties like colour, size, compressibility, entropy, content etc. However, we feel the methods of identification so evolved have certain limitations due to embedded obfuscation like complex backgrounds, compression artifacts and wide variety of fonts and formats .To overcome these limitations, we have proposed 2 methodologies(however there can be more). Each methodology has 4 stages. Both the methodologies are
almost similar except in the second stage where methodology I extracts low level features while the other extracts metadata features. Also a comparison between both the methodologies is shown. The method works on images with and without noise separately. Colour properties of the images are altered so that
OCR (Optical Character Recognition) can easily read the text embedded in the image. The proposed
methods are tested on a dataset of 1984 spam images and are found to be effective in identifying all types of spam images having (1) only text, (2) only images or (3) both text and images. The encouraging experimental results show that the methodology I achieves an accuracy of 92% while the other achieves an accuracy of 93.3%.
Notes for Advanced Image Processing subject. This subject comes under Computer Science for B.E./B.Tech and M.E./M.Tech. students. Hope this will help you.
Natural language description of images using hybrid recurrent neural networkIJECEIAES
We presented a learning model that generated natural language description of images. The model utilized the connections between natural language and visual data by produced text line based contents from a given image. Our Hybrid Recurrent Neural Network model is based on the intricacies of Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Bi-directional Recurrent Neural Network (BRNN) models. We conducted experiments on three benchmark datasets, e.g., Flickr8K, Flickr30K, and MS COCO. Our hybrid model utilized LSTM model to encode text line or sentences independent of the object location and BRNN for word representation, this reduced the computational complexities without compromising the accuracy of the descriptor. The model produced better accuracy in retrieving natural language based description on the dataset.
Comparative analysis of c99 and topictiling text segmentation algorithmseSAT Journals
Abstract In this paper, the work done includes the extraction of information from image datasets which contain natural text. The difficulty level of segmenting natural text from an image is too high and so precision is the most important factor to be kept in mind. To minimize the error rates, error filtration technique is provided, as filtration is adopted while doing image segmentation basically text segmentation present in images. Furthermore, a comparative analysis of two different text segmentation algorithms namely C99 and TopicTiling on image documents is presented. To assess how well each algorithm works, each was applied on different datasets and results were compared. The work done also proves the efficiency of TopicTiling over C99. Index Terms: Text Segmentation, text extraction, image documents,C99 and TopicTiling.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Text Extraction is a process by which we convert Printed document/Scanned Page or Image in which text are available to ASCII Character that a Computer can Recognize.
A Survey On Thresholding Operators of Text Extraction In VideosCSCJournals
ideo indexing is an important problem that has interested by the communities of visual information in image processing. The detection and extraction of scene and caption text from unconstrained, general purpose video is an important research problem in the context of content-based retrieval and summarization. In this paper, the technique presented is for detection text from frames video. Finding the textual contents in images is a challenging and promising research area in information technology. Consequently, text detection and recognition in multimedia had become one of the most important fields in computer vision due to its valuable uses in a variety of recent technical applications. The work in this paper consists using morphological operations for extract text appearing in the video frames. The proposed scheme well as preprocessing to differentiate among where it as the high similarity between text and background information. Experimental results show that the resultant image is the image with only text. The evaluated criteria are applied with the image result and one obtained bay different operator.
A Survey On Thresholding Operators of Text Extraction In VideosCSCJournals
Video indexing is an important problem that has interested by the communities of visual information in image processing. The detection and extraction of scene and caption text from unconstrained, general purpose video is an important research problem in the context of content-based retrieval and summarization. In this paper, the technique presented is for detection text from frames video. Finding the textual contents in images is a challenging and promising research area in information technology. Consequently, text detection and recognition in multimedia had become one of the most important fields in computer vision due to its valuable uses in a variety of recent technical applications. The work in this paper consists using morphological operations for extract text appearing in the video frames. The proposed scheme well as preprocessing to differentiate among where it as the high similarity between text and background information. Experimental results show that the resultant image is the image with only text. The evaluated criteria are applied with the image result and one obtained bay different operator.
LOCALIZATION OF OVERLAID TEXT BASED ON NOISE INCONSISTENCIESaciijournal
In this paper, we present a novel technique for localization of caption text in video frames based on noise inconsistencies. Text is artificially added to the video after it has been captured and as such does not form part of the original video graphics. Typically, the amount of noise level is uniform across the entire captured video frame, thus, artificially embedding or overlaying text on the video introduces yet another segment of noise level. Therefore detection of various noise levels in the video frame may signify availability of overlaid text. Hence we exploited this property by detecting regions with various noise levels to localize overlaid text in video frames. Experimental results obtained shows a great improvement in line with overlaid text localization, where we have performed metric measure based on Recall, Precision and f-measure.
Video Shot Boundary Detection Using The Scale Invariant Feature Transform and...IJECEIAES
Segmentation of the video sequence by detecting shot changes is essential for video analysis, indexing and retrieval. In this context, a shot boundary detection algorithm is proposed in this paper based on the scale invariant feature transform (SIFT). The first step of our method consists on a top down search scheme to detect the locations of transitions by comparing the ratio of matched features extracted via SIFT for every RGB channel of video frames. The overview step provides the locations of boundaries. Secondly, a moving average calculation is performed to determine the type of transition. The proposed method can be used for detecting gradual transitions and abrupt changes without requiring any training of the video content in advance. Experiments have been conducted on a multi type video database and show that this algorithm achieves well performances.
Design and Analysis of Quantization Based Low Bit Rate Encoding Systemijtsrd
The objective of this paper is to develop a low bit rate encoding for VQ problems such as real time image coding.. The decision tree is generated by an offline process.. A new systolic architecture to realize the encoder of full search vector quantization VQ for high speed applications is presented here. Over past decades digital video compression technologies have become an integral part. Therefore the purpose is to improve image quality in Remote cardiac pulse measurement using Adaptive filter. It describes the approach to be used for feature extraction from many images.. This paper presents a real time application of compression of the image processing technique which can be efficiently used for the interfacing with any hardware. Therefore we have used Raspberry Pi in compression of image. We have developed an algorithm that is based on the endoscopic images that consist of the differential pulse code modulation. The compressors consist of a low cost YEF colour space converters and variable length predictive algorithm for lossless compression. Mr. Nilesh Bodne | Dr. Sunil Kumar "Design and Analysis of Quantization Based Low Bit Rate Encoding System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-6 , October 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29289.pdf Paper URL: https://www.ijtsrd.com/engineering/electronics-and-communication-engineering/29289/design-and-analysis-of-quantization-based-low-bit-rate-encoding-system/mr-nilesh-bodne
Scene Text Detection of Curved Text Using Gradiant Vector Flow MethodIJTET Journal
Abstract--Text detection and recognition is a hot topic for researchers in the field of image processing and multimedia. Content based Image Retrieval (CBIR) community fills the semantic gap between low-level and high-level features. For text detection and extraction that achieve reasonable accuracy for multi-oriented text and natural scene text (camera images), several methods have been developed. In general most of the methods use classifier and large number of training samples to improve the accuracy in text detection. In general, connected components are used to tackle the multi-orientation problem. The connected component analysis based features with classifier training, work well for achieving better accuracy when the images are highly contrast. However, when the same methods used directly for text detection in video it results in disconnections, loss of shapes etc, because of low contrast and complex background. For such cases, deciding geometrical features of the components and classifier is not that easy. To overcome from this problem the proposed research uses Gradiant Vector Flow and Grouping based Method for Arbitrarily Oriented Scene text Detection method. The GVF of edge pixels in the Sobel edge map of the input frame is explored to identify the dominant edge pixels which represent text components. The method extracts dominant pixel’s edge components corresponding to the Sobel edge map, which is called Text Candidates (TC) of the text lines. Experimental results on different datasets including text data that is oriented arbitrary, non-horizontal text data also horizontal text data, Hua’s data and ICDAR-03 data (Camera images) show that the proposed method outperforms existing methods.
The advents in this technological era have resulted into enormous pool of information. This information is
stored at multiple places globally, in multiple formats. This article highlights a methodology for extracting
the video lectures delivered by experts in the domain of Computer Science by using Generalized Gamma
Mixture Model. The feature extraction is based on the DCT transformations. In order to propose the model,
the data set is pooled from the YouTube video lectures in the domain of Computer Science. The outputs
generated are evaluated using Precision and Recall.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A Texture Based Methodology for Text Region Extraction from Low Resolution Na...CSCJournals
Automated systems for understanding display boards are finding many applications useful in guiding tourists, assisting visually challenged and also in providing location aware information. Such systems require an automated method to detect and extract text prior to further image analysis. In this paper, a methodology to detect and extract text regions from low resolution natural scene images is presented. The proposed work is texture based and uses DCT based high pass filter to remove constant background. The texture features are then obtained on every 50x50 block of the processed image and potential text blocks are identified using newly defined discriminant functions. Further, the detected text blocks are merged and refined to extract text regions. The proposed method is robust and achieves a detection rate of 96.6% on a variety of 100 low resolution natural scene images each of size 240x320.
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...IJCSEIT Journal
A video fingerprint is a recognizer that is derived from a piece of video content. The video fingerprinting
methods obtain unique features of a video that differentiates one video clip from another. It aims to identify
whether a query video segment is a copy of video from the video database or not based on the signature of
the video. It is difficult to find whether a video is a copied video or a similar video, since the features of the
content are very similar from one video to the other. The main focus of this paper is to detect that the query
video is present in the video database with robustness depending on the content of video and also by fast
search of fingerprints. The Fingerprint Extraction Algorithm and Fast Search Algorithms are adopted in
this paper to achieve robust, fast, efficient and accurate video copy detection. As a first step, the
Fingerprint Extraction algorithm is employed which extracts a fingerprint through the features from the
image content of video. The images are represented as Temporally Informative Representative Images
(TIRI). Then, the second step is to find the presence of copy of a query video in a video database, in which
a close match of its fingerprint in the corresponding fingerprint database is searched using inverted-filebased
method. The proposed system is tested against various attacks like noise, brightness, contrast,
rotation and frame drop. Thus the performance of the proposed system on an average shows high true
positive rate of 98% and low false positive rate of 1.3% for different attacks.
Video Compression Algorithm Based on Frame Difference Approaches ijsc
The huge usage of digital multimedia via communications, wireless communications, Internet, Intranet and cellular mobile leads to incurable growth of data flow through these Media. The researchers go deep in developing efficient techniques in these fields such as compression of data, image and video. Recently, video compression techniques and their applications in many areas (educational, agriculture, medical …) cause this field to be one of the most interested fields. Wavelet transform is an efficient method that can be used to perform an efficient compression technique. This work deals with the developing of an efficient video compression approach based on frames difference approaches that concentrated on the calculation of frame near distance (difference between frames). The
selection of the meaningful frame depends on many factors such as compression performance, frame details, frame size and near distance between frames. Three different approaches are applied for removing the lowest frame difference. In this paper, many videos are tested to insure the efficiency of this technique, in addition a good performance results has been obtained.
A Framework for Curved Videotext Detection and ExtractionIJERA Editor
Proposed approach explores a new framework for curved video text detection and extraction. The algorithm first
utilizes a Gaussian filter based Color Edge Enhancement followed by a Gray level Co-occurrence matrix feature
extraction method for text detection. Secondly, a Connected Component filtering method is utilized to generate
clear localization result and at last, a Round Scan method is performed to extract curved text and generate binary
result for recognition by OCR. Experiments on various curved video data and Hua’s horizontal video text
dataset shows the effectiveness and robustness of the proposed method.
A Framework for Curved Videotext Detection and ExtractionIJERA Editor
Proposed approach explores a new framework for curved video text detection and extraction. The algorithm first utilizes a Gaussian filter based Color Edge Enhancement followed by a Gray level Co-occurrence matrix feature extraction method for text detection. Secondly, a Connected Component filtering method is utilized to generate clear localization result and at last, a Round Scan method is performed to extract curved text and generate binary result for recognition by OCR. Experiments on various curved video data and Hua’s horizontal video text dataset shows the effectiveness and robustness of the proposed method.
Electrically small antennas: The art of miniaturizationEditor IJARCET
We are living in the technological era, were we preferred to have the portable devices rather than unmovable devices. We are isolating our self rom the wires and we are becoming the habitual of wireless world what makes the device portable? I guess physical dimensions (mechanical) of that particular device, but along with this the electrical dimension is of the device is also of great importance. Reducing the physical dimension of the antenna would result in the small antenna but not electrically small antenna. We have different definition for the electrically small antenna but the one which is most appropriate is, where k is the wave number and is equal to and a is the radius of the imaginary sphere circumscribing the maximum dimension of the antenna. As the present day electronic devices progress to diminish in size, technocrats have become increasingly concentrated on electrically small antenna (ESA) designs to reduce the size of the antenna in the overall electronics system. Researchers in many fields, including RF and Microwave, biomedical technology and national intelligence, can benefit from electrically small antennas as long as the performance of the designed ESA meets the system requirement.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Volume 2-issue-6-2119-2124
1. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, Issue 6, June 2013
2119
www.ijarcet.org
Abstract— Text in the videos and images provides very
major information about the content of the videos, the video
text extraction provide a major role in semantic analysis of the
video, video indexing and video retrieval which is an important
role in video database. We propose an efficient method for
detecting, localizing and extracting the text appearing in the
videos with noisy and complex background. The text region
appearing in the video or an image has certain features that
distinguishes it from the rest of the background, we make use
of corner metric and Laplacian filtering techniques to detect
the text appearing in video independent of each other and
combine the results for an efficient detection and localization.
Then the binarization of the localized text is done by the seed
pixel determination of the text. the experimental results shows
the efficiency of the proposed system.
Index Terms— corner metric, Laplacian filter, text
detection, text localization, text binarization, text extraction,
video .
I. INTRODUCTION
Text in the videos and images provide a very major
information about the content of the videos, the video text
extraction provide a major role in semantic analysis of the
video, video indexing and video retrieval which is an
important role in video database[1] [5].
There are two types of text that can appear in the video or
an image: the scene text and the artificial text, the scene text is
type of text that accidentally appears in the video and doesn‟t
provide a reliable information for video indexing and
retrieval, the artificial text is a type of text that has been
artificially overlaid on the video and provides a valuable
information for video retrieval and indexing [7] so our focus
will remain on the extraction of artificial text. Therefore the
word text appearing in this paper refers to the artificial text.
Most of the video indexing research starts with the video
text recognition, the process of video text recognition can be
divided into four steps: text detection, text localization, text
binarization and text recognition, text detection is the process
of roughly differentiating the text and non-text regions of the
video, text localization process involves determining the
accurate boundaries of text strings, the text binarization step
involves filtering the background pixels in the text strings,
these extracted text pixels are then left for recognition, the
first three steps above mentioned are collectively known as
text extraction. The data flow for the text extraction system is
shown in the figure 1.
IMAGE / VIDEO FRAME
TEXT EXTRACTION
TEXT DETECTION
TEXT LOCALIZATION
TEXT BINARIZATION
BINARIZED IMAGE FOR
RECOGNITION
Fig. 1 The data flow for the text extraction system.
II. LITERATURE SURVEY
The video text detection methods can mainly be classified
into two categories: 1.) the connected component based
method which assumes that text strings contains some
unique characteristics such as uniform colors, font size and
spatial alignments are satisfied, these methods usually
performs color reduction and segmentation in some color
space and then perform connected component analysis to
detect the text regions. The main problem with this kind of
method is that it is not universal for all kind of images [2], [3].
2) Edge or texture based approach hold the assumption that
text regions have specific patterns and has more edge features
than the
Automatic Text Extraction in Video Based on
the Combined Corner Metric and Laplacian
Filtering Technique
Kaushik K.S1
, Suresha D2
1
Research scholar, Canara Engineering College, Mangalore
2
Assistant Professor, Canara Engineering College, Mangalore
2. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, Issue 6, June 2013
www.ijarcet.org
2120
smooth background. The main problem of this kind of
method is to reduce the noise coming from the backgrounds
[4], [5]. The texture based methods can still be categorized
into top-down approaches and bottom down approaches.
The top-down approach is based on splitting image regions
into horizontal and vertical direction based on the texture,
color or edge [4]. The bottom-up approach intends to find
homogenous regions from some seed regions, and then the
region growing technique is applied to merge the pixels
belonging to the same clusters [6].
Some of the related works of text detection and
localization involves the following: [7] Propose a solution for
detecting a text appearing in the images and videos by
making use of the corner response in the images which is
obtained by implementing corner detection algorithm
proposed by [25], the method doesn‟t detect the corner
accurately and also produces a lot a noise in the image with
high clarity, which adds an additional burden of removing the
noise without removing the actual text present in the video or
images. The method proposed by [8] computes an energy
measure of a set of DCT coefficients of intra-coded blocks as
a texture measure, this method obtains the seed pixels by
making use of multiple iterations and by using different
thresholding values for each iterations, the suitable seed
pixels are calculated by decreasing the threshold values for
each successive iterations, based on these pixels the text is
detected and localized in the videos. [9] Propose a novel
approach for efficient automated text detection in video data;
firstly the potential text candidates are identified by using an
edge-based multi-scale text detector. Secondly, image
entropy based improvement algorithm and a Stroke Width
Transform (SWT) based verification procedure are used for
refining the candidate text lines, Both types of artificial text
and recorded scene text can be localized reliably. [10]
Propose an effective coarse-to-fine algorithm to detect text in
video. Firstly, in coarse-detection part, candidate stroke
pixels are detected by making use of the stroke filter, and then
these pixels are connected together into a regions by making
use of a fast region growing algorithm, these regions are
further separated into candidate text lines by projection
operation. Secondly, in fine-detection section, support vector
machine (SVM) model and stroke features are employed to
select the correct text regions from the candidate ones, and
text regions in multi-resolution are integrated. Finally, the
results are optimized significantly according to temporal
correlation information. The proposed solution by [11]
explores new edge features such as straightness for the
elimination of non significant edges from the segmented text
portion of a video frame to detect accurate boundary of the
text lines in video images. The method introduces candidate
text block selection from a given image, to segment the
complete text portions. Based on combination of filters and
edge analysis, heuristic rules are formed for identifying a
candidate text block in the image. This method effectively
detects and localize the text but it includes many steps to do
so, which in turn reduces the performance in terms of time to
detect and localize the text appearing in the video. The
solution proposed by [12] is based on invariant features, such
as edge strength, edge density, and horizontal distribution.
First, it applies edge detection and then filter outs the verified
non-text edges by using a low threshold value. Then, to both
keep low-contrast text and simplify complex background of
high-contrast text, a local threshold value is selected. Next,
the high edge strength or high edge density is enhanced by
using two text-area enhancement operators. Finally,
coarse-to-fine detection locates text regions efficiently. [13]
Proposes a hybrid system for text detection in video frames.
The system consists of two main stages. In the first stage edge
map of the image is used for the detection of text regions. In
the sequel, a refinement stage uses an SVM classifier trained
on features obtained by a new Local Binary Pattern based
operator which results in diminishing false alarms. [14]
Propose an edge based detection method which is based on
an edge map produced by the Sobel operator followed by
smoothing filters, morphological operations and geometrical
constraints. [15] Propose a system for object detection based
on Local Binary Patterns (LBP) and Cascade histogram
matching. The LBP operator consists of a 3x3 kernel where
the centre pixel is used as a threshold. Then the eight
binarized neighbors are multiplied by the binomial weight
producing an integer that represents a unique texture pattern.
This proposed method is used for videotext and car detection.
[16] Proposes an improved algorithm for the automatic
extraction of artificially overlaid text in sports video. First
step is making use of the color Histogram technique to
minimize the number of video frames for the identification of
key frames from video. Then, the efficient text detection is
done by converting the key images into a gray scale images.
Generally, the super imposed text displayed in bottom part of
the image in the sports video. So, they cropped the text image
regions in the gray image which contains the text
information. Then canny edge detection algorithms are
applied for effective text edge detection. The proposed
solution [17] explores an idea of classifying low contrast and
high contrast video images in order to detect accurate
boundary of the text lines in video images. Were high
contrast refers to sharpness while low contrast refers to dim
intensity values in the video images. The classification of the
text regions is done by introducing heuristic rules based on
combination of filters and edge analysis. The heuristic rules
are derived based on the fact that the number of canny edge
components is less than the number of Sobel edge
components in the case of low contrast video images, and
vice versa for high contrast video images. The solution
proposed by [18] handles the complexity of video text
detection by combining the Fourier and statistical features in
color space of the image, since it uses a large number of
features in frequency and color domain, it is computationally
expensive and it is limited to horizontal text detection. [19]
used the homogeneity of intensity of text regions in images.
Pixels with similar gray levels are merged into a group, the
candidate regions are then subjected to verification using size,
area, fill factor, and contrast. [20] Propose a method for text
detection by combining the edge and gradient features of the
3. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, Issue 6, June 2013
2121
www.ijarcet.org
image, this method produces efficient text detection with low
false positives but the detection technique works only the
image where the text is overlaid in the horizontal direction.
[21] Uses a combination of texture features and edge features
in the wavelet transform domain with a classifier to detect
text in video. [22] Locates text in images and video frames by
making use of neural network classifier and image gradient
features. Some methods [23], [24] assume that the text
strokes have a certain contrast against the background.
Therefore, those areas with dense edges are detected as text
regions.
Text binarization methods can be categorized into two
main groups: thresholding-based algorithms including global
thresholding methods [26], which use a single threshold for
the entire document, image or video generally calculated by
making use of global thresholding algorithms and local
thresholding methods [27], [28], [29], which assign a
threshold for each small local region of the processed data.
The second class sums up region-based, clustering based [30]
and edge-based [31] methods.
III. PROPOSED SOLUTION
The architecture of the proposed solution is shown in the
figure 2.
Fig. 2 The data flow for the text extraction system.
For an efficient video indexing and content based video
retrieval it is necessary to detect and extract every text
appearing in a video without missing any block of text, we
have considered the worst case that the text appearing in the
video changes (new text appears) for each and every second
but not within a second. The video frames are captured at
every second and stored in a frame queue; therefore the
number of frame captured and stored in the queue depends
on the length of the video, for example if the length of the
video is 5 seconds then 6 frames are stored in the queue,
which are captured at each of the 5seconds and the frame
captured at the 0th
(zero) second. As shown in the figure 2
each of these frames is then processed for text detection, text
localization and text binarization independent of each other.
A. Text Detection
The text detection part comprises of three steps: Corner
Metric, Laplacian Filtering and Multiplication. Both the
corner metric and Laplacian filtering technique make use of
the different properties of artificial text for its detection and
the independent results of both these techniques are
combined to form a single result by performing the
multiplication on the results of these individual results,
combining the results is done because it performs a very
efficient noise reduction.
1) Corner Metric: A corner is a special two dimensional
feature point which has high curvature in the region boundary.
It can be detected by finding the local maximum in the corner
metric. Corner metric is a feature that describes the possibility
of the corner points in the different parts of the image.
The text appearing in the videos and images has greater
edge strength and edge density compared to background of
the image and hence has greater possibilities of having a
corner points. This property of the text allows it to be
detected by obtaining the corner metric on the video frames
or the images.
There are many ways to obtain the corner metric in the
image; we have used the corner detection method proposed
by Shi and Thomasi [33], [35] (also referred as
Kanade-Tomasi corner detection) since it is found to be the
most accurate in detecting corners. Here we briefly explain
the calculation of corner metric. For more technical details,
see [33]. For a given image I(x, y), the basic form of corner
metric is shown in equation below.
Here w(u, v) is a window function, which can be written in
the matrix form as:
Here A is given by:
Here Ix and Iy represents the partial derivative of I, the
angle brackets denote averaging. A corner (or in general an
interest point) is characterized by a large variation of S in all
directions of the vector by (x,y) analysing the eigenvalues of
A, this characterization can be expressed in the following
way: A should have two large eigenvalues for an interest
4. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, Issue 6, June 2013
www.ijarcet.org
2122
point. Let λ1 and λ2 be the two eigenvalues of A, corner
detection method computes a corner in the image if the
following condition is satisfied:
min (λ1, λ2) > λ
The above states that a corner is detected if both λ1 and λ2
has value that is greater than a predefined threshold value λ.
The sample input image is shown in figure 3 and the
corresponding corner metric of the input image is shown in
figure 4.
Fig. 3 Sample input image.
Fig..4 Corner Metric of the sample input image.
The resultant corner metric of the input image will be a
binary image; the white pixels corresponding to the text
region of the input image are the detected text pixels, while
the remaining white pixels in the background are considered
to be the noise pixels which must be removed for efficient
text localization.
1) Laplacian Filtering: The Laplacian filter [34] is an
isotropic filter i.e. the response is independent of the direction
of the discontinuity in the image, such filters are rotation
invariant.
Another important feature of the text appearing in the
images and videos is the greater intensity discontinuity
between the text and the image background, based on this
property, the Laplacian filtering technique is used to detect
the text as it highlights intensity discontinuities in an image
and deemphasizes regions with slowly varying intensity
levels. The Laplacian filter uses second order derivative (2-D)
implementation for image sharpening, the mathematical
equations of the Laplacian filter is given below:
The above equation can be implemented by using a
Laplacian filter shown in the figure 5.
0
1
0
1 0
-4
1
1
0
Fig.5 Laplacian filter mask
The noise produced by the Laplacian method can be
reduced by binarizing the filtered image using a global
thresholding method. The figure 6 shows the result of the
Laplacian filtering on the sample input image shown in figure
3.
Fig. 6 Filtered image of the sample input image.
The figure 6 shows the detected text pixels as well as the
noise pixels which are removed in the next step.
2) Multiplication: Both the corner metric method as
well as the Laplacian filtering method detects the artificial text
but also produces noise in the background. For the text
localization it is necessary to efficiently reduce the noise
pixels.
Since the both the corner metric and Laplacian filtering
make use of different properties of the text for detection, the
noise produced by these two techniques will also be different
from each other, based on this fact we combine (intersection)
the result of corner metric and Laplacian filtering to form a
single resultant binary image such that it contains only the
common text pixels. Combining the resultant binary images
of both the corner metric and the Laplacian filtering
technique is done by multiplication. The figure 7 shows the
multiplication result of the corner metric and Laplacian
filtering results.
5. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, Issue 6, June 2013
2123
www.ijarcet.org
Fig..7 Multiplication result of corner metric and Laplacian filtering.
Apart from the noise reduction, the multiplication process
also determines whether the given image or video frame does
have an artificial text embedded in it. Multiplication process
produces a blank image with zero white pixels if the data
contains no text pixels. The shape of the detected text region
is still irregular, and it needs to be refined into a aligned
rectangle, this process is carried out in the text localization
step.
B. Text Localization
The binary image that is obtained from the text detection
process is fed as an input to the text localization process, but
the localization of the text is not done on binary image
instead the done for the original input data by making use of
the information present in the binary image obtained from the
detection process.
In text localization process the binary is scanned for white
(1) pixels in both X and Y axis, the first location of the first
white pixel obtained in X axis is considered as Xmin and the
location of the last white pixel obtained in the X axis is
considered as Xmax, the same procedure is carried out in Y
axis and the location of the first and the last white pixel in the
Y axis is considered as Ymin and Ymax respectively. The pixels
Xmin, X max, Ymin and Ymax are called as border pixels.
The border pixels are mapped onto the original input image
/video frame and extended by two pixel position away from
the text region; this forms a rectangular text region in the
original image which is then cropped /segmented from the
rest of the background. The result of the text localization
process is shown in figure 8.
Fig.8 Localized text.
C. Text Binarization
The final process of the text extraction mechanism is text
binarization; the localized text obtained from the previous
process is fed to text binarization process, the result of the
binarization process should be in such a way that the text
should be black (0) with white background (1), this condition
is necessary for standard recognition algorithm to recognize
the text from the binary image [32].
Since the text pixels should be black against the white
background, the binarization technique like global
thresholding works only when the colour of the text
appearing in the video is known well in advance, it does not
work globally for all the colours ranging from black to white.
Hence determination of the text colour is necessary for
efficient text binarization.
The process of determining the text colour is called seed
selection, we assume that the colour of the text is unique in a
given image or video frame, so determination of the single
seed pixel provides sufficient information for binarization.
We have considered the that Xmin obtained in the
multiplication step provides the location of the beginning of
the text, to make sure that the seed location falls inside the
body of the text we added one position to Xmin to get seed
pixel location to get Xseed, hence Xseed can be obtained from
Xmin using the equation Xseed = Xmin + 1. Then we map the Xseed
to the original input image and obtain the intensity or grey
scale value at that location, this obtained pixel value is
considered to the seed pixel SP.
We consider SP to be pixel value (colour) of the text, but
sometimes due to lossy compression of the image/video the
pixel value of the entire text will not remain the same, there
will be a slight deviation in the intensity value throughout the
text, hence we cannot just rely on a single seed pixel,
therefore we have to consider a range of seed pixels to obtain
proper and effective binarization.
Let us consider the seed pixel values ranges from SPmin to
SPmax which are minimum intensity seed value and maximum
intensity seed value respectively. Along with SPmin and SPmax,
all the pixels value which fall in-between SPmin and SPmax are
considered to be the candidate seed values. The SPmin and
SPmax can be obtained by subtracting and adding T to SP. The
mathematical equation is given below:
SPmin = SP – T
SPmax = SP + T
Based on the minimum and maximum seed values we can
perform the binarization, the binarization is performed on the
localized text obtained from the previous step. We make use
of double thresholding technique for binarization using SPmin
and SPmax as the two threshold values. The pixels with
intensity values less than SPmin and the pixels with intensity
values greater than SPmax is converted to one (1) and the pixel
values that fall in-between SPmin and SPmax is converted into
zero (0). This results in a binary image with the black text
pixels against the white background irrespective of the
original colour of the text appeared in the input data. The
figure 9 shows the result of text binarization performed on the
localized text shown in figure 8.
Fig. 9 Binarized text.
In our experiments we have the value of T as 5.