The text data present in images and video contain certain useful information for automatic annotation, indexing, and structuring of images. However variations of the text due to differences in text style, font, size, orientation, alignment as well as low image contrast and complex background make the problem of automatic text extraction extremely difficult and challenging job. A large number of techniques have been proposed to address this problem and the purpose of this paper is to design algorithms for each phase of extracting text from a video using java libraries and classes. Here first we frame the input video into stream of images using the Java Media Framework (JMF) with the input being a real time or a video from the database. Then we apply pre processing algorithms to convert the image to gray scale and remove the disturbances like superimposed lines over the text, discontinuity removal, and dot removal. Then we continue with the algorithms for localization, segmentation and recognition for which we use the neural network pattern matching technique. The performance of our approach is demonstrated by presenting experimental results for a set of static images.
Recognition and tracking moving objects using moving camera in complex scenesIJCSEA Journal
In this paper, we propose a method for effectively tracking moving objects in videos captured using a
moving camera in complex scenes. The video sequences may contain highly dynamic backgrounds and
illumination changes. Four main steps are involved in the proposed method. First, the video is stabilized
using affine transformation. Second, intelligent selection of frames is performed in order to extract only
those frames that have a considerable change in content. This step reduces complexity and computational
time. Third, the moving object is tracked using Kalman filter and Gaussian mixture model. Finally object
recognition using Bag of features is performed in order to recognize the moving objects.
LOCALIZATION OF OVERLAID TEXT BASED ON NOISE INCONSISTENCIESaciijournal
In this paper, we present a novel technique for localization of caption text in video frames based on noise inconsistencies. Text is artificially added to the video after it has been captured and as such does not form part of the original video graphics. Typically, the amount of noise level is uniform across the entire captured video frame, thus, artificially embedding or overlaying text on the video introduces yet another segment of noise level. Therefore detection of various noise levels in the video frame may signify availability of overlaid text. Hence we exploited this property by detecting regions with various noise levels to localize overlaid text in video frames. Experimental results obtained shows a great improvement in line with overlaid text localization, where we have performed metric measure based on Recall, Precision and f-measure.
Recognition and tracking moving objects using moving camera in complex scenesIJCSEA Journal
In this paper, we propose a method for effectively tracking moving objects in videos captured using a
moving camera in complex scenes. The video sequences may contain highly dynamic backgrounds and
illumination changes. Four main steps are involved in the proposed method. First, the video is stabilized
using affine transformation. Second, intelligent selection of frames is performed in order to extract only
those frames that have a considerable change in content. This step reduces complexity and computational
time. Third, the moving object is tracked using Kalman filter and Gaussian mixture model. Finally object
recognition using Bag of features is performed in order to recognize the moving objects.
LOCALIZATION OF OVERLAID TEXT BASED ON NOISE INCONSISTENCIESaciijournal
In this paper, we present a novel technique for localization of caption text in video frames based on noise inconsistencies. Text is artificially added to the video after it has been captured and as such does not form part of the original video graphics. Typically, the amount of noise level is uniform across the entire captured video frame, thus, artificially embedding or overlaying text on the video introduces yet another segment of noise level. Therefore detection of various noise levels in the video frame may signify availability of overlaid text. Hence we exploited this property by detecting regions with various noise levels to localize overlaid text in video frames. Experimental results obtained shows a great improvement in line with overlaid text localization, where we have performed metric measure based on Recall, Precision and f-measure.
Digital video is, a sequence of images, called frames, displayed at a certain frame rate (so many frames per second, or fps) to create the illusion of animation.
Dynamic Threshold in Clip Analysis and RetrievalCSCJournals
Key frame extraction can be helpful in video summarization, analysis, indexing, browsing, and retrieval. Clip analysis of key frame sequences is an open research issues. The paper deals with identification and extraction of key frames using dynamic threshold followed by video retrieval. The number of key frames to be extracted for each shot depends on the activity details of the shot. This system uses the statistics of comparison between the successive frames within a level extracted on the basis of color histograms and dynamic threshold. Two program interfaces are linked for clip analysis and video indexing and retrieval using entropy. The results using proposed system on few video sequences are tested and the extracted key frames and retrieved results are shown.
Video Shot Boundary Detection Using The Scale Invariant Feature Transform and...IJECEIAES
Segmentation of the video sequence by detecting shot changes is essential for video analysis, indexing and retrieval. In this context, a shot boundary detection algorithm is proposed in this paper based on the scale invariant feature transform (SIFT). The first step of our method consists on a top down search scheme to detect the locations of transitions by comparing the ratio of matched features extracted via SIFT for every RGB channel of video frames. The overview step provides the locations of boundaries. Secondly, a moving average calculation is performed to determine the type of transition. The proposed method can be used for detecting gradual transitions and abrupt changes without requiring any training of the video content in advance. Experiments have been conducted on a multi type video database and show that this algorithm achieves well performances.
VISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATIONcscpconf
Recent developments in digital video and drastic increase of internet use have increased the
amount of people searching and watching videos online. In order to make the search of the
videos easy, Summary of the video may be provided along with each video. The video summary
provided thus should be effective so that the user would come to know the content of the video
without having to watch it fully. The summary produced should consists of the key frames that
effectively express the content and context of the video. This work suggests a method to extract
key frames which express most of the information in the video. This is achieved by quantifying
Visual attention each frame commands. Visual attention of each frame is quantified using a
descriptor called Attention quantifier. This quantification of visual attention is based on the
human attention mechanism that indicates color conspicuousness and the motion involved seek
more attention. So based on the color conspicuousness and the motion involved each frame is
given a Attention parameter. Based on the attention quantifier value the key frames are extracted and are summarized adaptively. This framework suggests a method to produces meaningful video summary.
AN EMERGING TREND OF FEATURE EXTRACTION METHOD IN VIDEO PROCESSINGcscpconf
Recently the progress in technology and flourishing applications open up new forecast and defy
for the image and video processing community. Compared to still images, video sequences
afford more information about how objects and scenarios change over time. Quality of video is
very significant before applying it to any kind of processing techniques. This paper deals with
two major problems in video processing they are noise reduction and object segmentation on
video frames. The segmentation of objects is performed using foreground segmentation based
and fuzzy c-means clustering segmentation is compared with the proposed method Improvised
fuzzy c – means segmentation based on color. This was applied in the video frame to segment
various objects in the current frame. The proposed technique is a powerful method for image
segmentation and it works for both single and multiple feature data with spatial information.
The experimental result was conducted using various noises and filtering methods to show which is best suited among others and the proposed segmentation approach generates good quality segmented frames.
A binarization technique for extraction of devanagari text from camera based ...sipij
This paper presents a binarization method for camera based natural scene (NS) images based on edge
analysis and morphological dilation. Image is converted to grey scale image and edge detection is carried
out using canny edge detection. The edge image is dilated using morphological dilation and analyzed to
remove edges corresponding to non-text regions. The image is binarized using mean and standard
deviation of edge pixels. Post processing of resulting images is done to fill gaps and to smooth text strokes.
The algorithm is tested on a variety of NS images captured using a digital camera under variable
resolutions, lightening conditions having text of different fonts, styles and backgrounds. The results are
compared with other standard techniques. The method is fast and works well for camera based natural
scene images.
Video Compression Algorithm Based on Frame Difference Approaches ijsc
The huge usage of digital multimedia via communications, wireless communications, Internet, Intranet and cellular mobile leads to incurable growth of data flow through these Media. The researchers go deep in developing efficient techniques in these fields such as compression of data, image and video. Recently, video compression techniques and their applications in many areas (educational, agriculture, medical …) cause this field to be one of the most interested fields. Wavelet transform is an efficient method that can be used to perform an efficient compression technique. This work deals with the developing of an efficient video compression approach based on frames difference approaches that concentrated on the calculation of frame near distance (difference between frames). The
selection of the meaningful frame depends on many factors such as compression performance, frame details, frame size and near distance between frames. Three different approaches are applied for removing the lowest frame difference. In this paper, many videos are tested to insure the efficiency of this technique, in addition a good performance results has been obtained.
A Survey On Thresholding Operators of Text Extraction In VideosCSCJournals
ideo indexing is an important problem that has interested by the communities of visual information in image processing. The detection and extraction of scene and caption text from unconstrained, general purpose video is an important research problem in the context of content-based retrieval and summarization. In this paper, the technique presented is for detection text from frames video. Finding the textual contents in images is a challenging and promising research area in information technology. Consequently, text detection and recognition in multimedia had become one of the most important fields in computer vision due to its valuable uses in a variety of recent technical applications. The work in this paper consists using morphological operations for extract text appearing in the video frames. The proposed scheme well as preprocessing to differentiate among where it as the high similarity between text and background information. Experimental results show that the resultant image is the image with only text. The evaluated criteria are applied with the image result and one obtained bay different operator.
A Survey On Thresholding Operators of Text Extraction In VideosCSCJournals
Video indexing is an important problem that has interested by the communities of visual information in image processing. The detection and extraction of scene and caption text from unconstrained, general purpose video is an important research problem in the context of content-based retrieval and summarization. In this paper, the technique presented is for detection text from frames video. Finding the textual contents in images is a challenging and promising research area in information technology. Consequently, text detection and recognition in multimedia had become one of the most important fields in computer vision due to its valuable uses in a variety of recent technical applications. The work in this paper consists using morphological operations for extract text appearing in the video frames. The proposed scheme well as preprocessing to differentiate among where it as the high similarity between text and background information. Experimental results show that the resultant image is the image with only text. The evaluated criteria are applied with the image result and one obtained bay different operator.
VIDEO SEGMENTATION & SUMMARIZATION USING MODIFIED GENETIC ALGORITHMijcsa
Video summarization of the segmented video is an essential process for video thumbnails, video
surveillance and video downloading. Summarization deals with extracting few frames from each scene and
creating a summary video which explains all course of action of full video with in short duration of time.
The proposed research work discusses about the segmentation and summarization of the frames. A genetic
algorithm (GA) for segmentation and summarization is required to view the highlight of an event by
selecting few important frames required. The GA is modified to select only key frames for summarization
and the comparison of modified GA is done with the GA.
VIDEO SEGMENTATION & SUMMARIZATION USING MODIFIED GENETIC ALGORITHMijcsa
Video summarization of the segmented video is an essential process for video thumbnails, video surveillance and video downloading. Summarization deals with extracting few frames from each scene and creating a summary video which explains all course of action of full video with in short duration of time. The proposed research work discusses about the segmentation and summarization of the frames. A genetic algorithm (GA) for segmentation and summarization is required to view the highlight of an event by selecting few important frames required. The GA is modified to select only key frames for summarization and the comparison of modified GA is done with the GA.
VIDEO SEGMENTATION & SUMMARIZATION USING MODIFIED GENETIC ALGORITHMijcsa
Video summarization of the segmented video is an essential process for video thumbnails, video
surveillance and video downloading. Summarization deals with extracting few frames from each scene and
creating a summary video which explains all course of action of full video with in short duration of time.
The proposed research work discusses about the segmentation and summarization of the frames. A genetic
algorithm (GA) for segmentation and summarization is required to view the highlight of an event by
selecting few important frames required. The GA is modified to select only key frames for summarization
and the comparison of modified GA is done with the GA.
Analysis of Impact of Channel Error Rate on Average PSNR in Multimedia TrafficIOSR Journals
Abstract : The performance of the multimedia traffic in Ad-Hoc networks is highly impacted with the Signal to Noise Ratio. The Average PSNR (Peak Signal to Noise Ratio) is an important parameter for the evaluation of multimedia traffic in Ad-Hoc Networks. With the increase of bandwidth of the channels, it becomes necessary to take care of other network parameters like PSNR and ASNR( Average Signal to Noise Ratio) .Enhanced bandwidth with higher channel error rates demand a careful analysis of signal to noise ratio for optimum performance. In this paper, we have evaluated the effect of channel error rate on Average PSNR for the MPEG-4 traffic in Ad-hoc Networks. Keywords: MANETs, Evalvid, MPEG-4, Fragmentation, PSNR
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Digital video is, a sequence of images, called frames, displayed at a certain frame rate (so many frames per second, or fps) to create the illusion of animation.
Dynamic Threshold in Clip Analysis and RetrievalCSCJournals
Key frame extraction can be helpful in video summarization, analysis, indexing, browsing, and retrieval. Clip analysis of key frame sequences is an open research issues. The paper deals with identification and extraction of key frames using dynamic threshold followed by video retrieval. The number of key frames to be extracted for each shot depends on the activity details of the shot. This system uses the statistics of comparison between the successive frames within a level extracted on the basis of color histograms and dynamic threshold. Two program interfaces are linked for clip analysis and video indexing and retrieval using entropy. The results using proposed system on few video sequences are tested and the extracted key frames and retrieved results are shown.
Video Shot Boundary Detection Using The Scale Invariant Feature Transform and...IJECEIAES
Segmentation of the video sequence by detecting shot changes is essential for video analysis, indexing and retrieval. In this context, a shot boundary detection algorithm is proposed in this paper based on the scale invariant feature transform (SIFT). The first step of our method consists on a top down search scheme to detect the locations of transitions by comparing the ratio of matched features extracted via SIFT for every RGB channel of video frames. The overview step provides the locations of boundaries. Secondly, a moving average calculation is performed to determine the type of transition. The proposed method can be used for detecting gradual transitions and abrupt changes without requiring any training of the video content in advance. Experiments have been conducted on a multi type video database and show that this algorithm achieves well performances.
VISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATIONcscpconf
Recent developments in digital video and drastic increase of internet use have increased the
amount of people searching and watching videos online. In order to make the search of the
videos easy, Summary of the video may be provided along with each video. The video summary
provided thus should be effective so that the user would come to know the content of the video
without having to watch it fully. The summary produced should consists of the key frames that
effectively express the content and context of the video. This work suggests a method to extract
key frames which express most of the information in the video. This is achieved by quantifying
Visual attention each frame commands. Visual attention of each frame is quantified using a
descriptor called Attention quantifier. This quantification of visual attention is based on the
human attention mechanism that indicates color conspicuousness and the motion involved seek
more attention. So based on the color conspicuousness and the motion involved each frame is
given a Attention parameter. Based on the attention quantifier value the key frames are extracted and are summarized adaptively. This framework suggests a method to produces meaningful video summary.
AN EMERGING TREND OF FEATURE EXTRACTION METHOD IN VIDEO PROCESSINGcscpconf
Recently the progress in technology and flourishing applications open up new forecast and defy
for the image and video processing community. Compared to still images, video sequences
afford more information about how objects and scenarios change over time. Quality of video is
very significant before applying it to any kind of processing techniques. This paper deals with
two major problems in video processing they are noise reduction and object segmentation on
video frames. The segmentation of objects is performed using foreground segmentation based
and fuzzy c-means clustering segmentation is compared with the proposed method Improvised
fuzzy c – means segmentation based on color. This was applied in the video frame to segment
various objects in the current frame. The proposed technique is a powerful method for image
segmentation and it works for both single and multiple feature data with spatial information.
The experimental result was conducted using various noises and filtering methods to show which is best suited among others and the proposed segmentation approach generates good quality segmented frames.
A binarization technique for extraction of devanagari text from camera based ...sipij
This paper presents a binarization method for camera based natural scene (NS) images based on edge
analysis and morphological dilation. Image is converted to grey scale image and edge detection is carried
out using canny edge detection. The edge image is dilated using morphological dilation and analyzed to
remove edges corresponding to non-text regions. The image is binarized using mean and standard
deviation of edge pixels. Post processing of resulting images is done to fill gaps and to smooth text strokes.
The algorithm is tested on a variety of NS images captured using a digital camera under variable
resolutions, lightening conditions having text of different fonts, styles and backgrounds. The results are
compared with other standard techniques. The method is fast and works well for camera based natural
scene images.
Video Compression Algorithm Based on Frame Difference Approaches ijsc
The huge usage of digital multimedia via communications, wireless communications, Internet, Intranet and cellular mobile leads to incurable growth of data flow through these Media. The researchers go deep in developing efficient techniques in these fields such as compression of data, image and video. Recently, video compression techniques and their applications in many areas (educational, agriculture, medical …) cause this field to be one of the most interested fields. Wavelet transform is an efficient method that can be used to perform an efficient compression technique. This work deals with the developing of an efficient video compression approach based on frames difference approaches that concentrated on the calculation of frame near distance (difference between frames). The
selection of the meaningful frame depends on many factors such as compression performance, frame details, frame size and near distance between frames. Three different approaches are applied for removing the lowest frame difference. In this paper, many videos are tested to insure the efficiency of this technique, in addition a good performance results has been obtained.
A Survey On Thresholding Operators of Text Extraction In VideosCSCJournals
ideo indexing is an important problem that has interested by the communities of visual information in image processing. The detection and extraction of scene and caption text from unconstrained, general purpose video is an important research problem in the context of content-based retrieval and summarization. In this paper, the technique presented is for detection text from frames video. Finding the textual contents in images is a challenging and promising research area in information technology. Consequently, text detection and recognition in multimedia had become one of the most important fields in computer vision due to its valuable uses in a variety of recent technical applications. The work in this paper consists using morphological operations for extract text appearing in the video frames. The proposed scheme well as preprocessing to differentiate among where it as the high similarity between text and background information. Experimental results show that the resultant image is the image with only text. The evaluated criteria are applied with the image result and one obtained bay different operator.
A Survey On Thresholding Operators of Text Extraction In VideosCSCJournals
Video indexing is an important problem that has interested by the communities of visual information in image processing. The detection and extraction of scene and caption text from unconstrained, general purpose video is an important research problem in the context of content-based retrieval and summarization. In this paper, the technique presented is for detection text from frames video. Finding the textual contents in images is a challenging and promising research area in information technology. Consequently, text detection and recognition in multimedia had become one of the most important fields in computer vision due to its valuable uses in a variety of recent technical applications. The work in this paper consists using morphological operations for extract text appearing in the video frames. The proposed scheme well as preprocessing to differentiate among where it as the high similarity between text and background information. Experimental results show that the resultant image is the image with only text. The evaluated criteria are applied with the image result and one obtained bay different operator.
VIDEO SEGMENTATION & SUMMARIZATION USING MODIFIED GENETIC ALGORITHMijcsa
Video summarization of the segmented video is an essential process for video thumbnails, video
surveillance and video downloading. Summarization deals with extracting few frames from each scene and
creating a summary video which explains all course of action of full video with in short duration of time.
The proposed research work discusses about the segmentation and summarization of the frames. A genetic
algorithm (GA) for segmentation and summarization is required to view the highlight of an event by
selecting few important frames required. The GA is modified to select only key frames for summarization
and the comparison of modified GA is done with the GA.
VIDEO SEGMENTATION & SUMMARIZATION USING MODIFIED GENETIC ALGORITHMijcsa
Video summarization of the segmented video is an essential process for video thumbnails, video surveillance and video downloading. Summarization deals with extracting few frames from each scene and creating a summary video which explains all course of action of full video with in short duration of time. The proposed research work discusses about the segmentation and summarization of the frames. A genetic algorithm (GA) for segmentation and summarization is required to view the highlight of an event by selecting few important frames required. The GA is modified to select only key frames for summarization and the comparison of modified GA is done with the GA.
VIDEO SEGMENTATION & SUMMARIZATION USING MODIFIED GENETIC ALGORITHMijcsa
Video summarization of the segmented video is an essential process for video thumbnails, video
surveillance and video downloading. Summarization deals with extracting few frames from each scene and
creating a summary video which explains all course of action of full video with in short duration of time.
The proposed research work discusses about the segmentation and summarization of the frames. A genetic
algorithm (GA) for segmentation and summarization is required to view the highlight of an event by
selecting few important frames required. The GA is modified to select only key frames for summarization
and the comparison of modified GA is done with the GA.
Analysis of Impact of Channel Error Rate on Average PSNR in Multimedia TrafficIOSR Journals
Abstract : The performance of the multimedia traffic in Ad-Hoc networks is highly impacted with the Signal to Noise Ratio. The Average PSNR (Peak Signal to Noise Ratio) is an important parameter for the evaluation of multimedia traffic in Ad-Hoc Networks. With the increase of bandwidth of the channels, it becomes necessary to take care of other network parameters like PSNR and ASNR( Average Signal to Noise Ratio) .Enhanced bandwidth with higher channel error rates demand a careful analysis of signal to noise ratio for optimum performance. In this paper, we have evaluated the effect of channel error rate on Average PSNR for the MPEG-4 traffic in Ad-hoc Networks. Keywords: MANETs, Evalvid, MPEG-4, Fragmentation, PSNR
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Water billing management system project report.pdfKamal Acharya
Our project entitled “Water Billing Management System” aims is to generate Water bill with all the charges and penalty. Manual system that is employed is extremely laborious and quite inadequate. It only makes the process more difficult and hard.
The aim of our project is to develop a system that is meant to partially computerize the work performed in the Water Board like generating monthly Water bill, record of consuming unit of water, store record of the customer and previous unpaid record.
We used HTML/PHP as front end and MYSQL as back end for developing our project. HTML is primarily a visual design environment. We can create a android application by designing the form and that make up the user interface. Adding android application code to the form and the objects such as buttons and text boxes on them and adding any required support code in additional modular.
MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software. It is a stable ,reliable and the powerful solution with the advanced features and advantages which are as follows: Data Security.MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
The Internet of Things (IoT) is a revolutionary concept that connects everyday objects and devices to the internet, enabling them to communicate, collect, and exchange data. Imagine a world where your refrigerator notifies you when you’re running low on groceries, or streetlights adjust their brightness based on traffic patterns – that’s the power of IoT. In essence, IoT transforms ordinary objects into smart, interconnected devices, creating a network of endless possibilities.
Here is a blog on the role of electrical and electronics engineers in IOT. Let's dig in!!!!
For more such content visit: https://nttftrg.com/
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Fundamentals of Electric Drives and its applications.pptx
Extracting Text From Video
1. Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.2, June 2011
DOI : 10.5121/sipij.2011.2209 103
EXTRACTING TEXT FROM VIDEO
Jayshree Ghorpade1
, Raviraj Palvankar2
, Ajinkya Patankar3
and Snehal Rathi4
1
Department of Computer Engineering, MIT COE, Pune, India
jayshree.aj@gmail.com
2
Department of Computer Engineering, MIT COE, Pune, India
ravirajp7@gmail.com
3
Department of Computer Engineering, MIT COE, Pune, India
ajinkya1412@gmail.com
4
Department of Computer Engineering, MIT COE, Pune, India
snehalrathi_done@gmail.com
ABSTRACT
The text data present in images and video contain certain useful information for automatic annotation,
indexing, and structuring of images. However variations of the text due to differences in text style, font,
size, orientation, alignment as well as low image contrast and complex background make the problem of
automatic text extraction extremely difficult and challenging job. A large number of techniques have been
proposed to address this problem and the purpose of this paper is to design algorithms for each phase of
extracting text from a video using java libraries and classes. Here first we frame the input video into
stream of images using the Java Media Framework (JMF) with the input being a real time or a video
from the database. Then we apply pre processing algorithms to convert the image to gray scale and
remove the disturbances like superimposed lines over the text, discontinuity removal, and dot removal.
Then we continue with the algorithms for localization, segmentation and recognition for which we use the
neural network pattern matching technique. The performance of our approach is demonstrated by
presenting experimental results for a set of static images.
KEYWORDS
JMF, Image Processing, Text Extraction, Text Localisation, Segmentation, Text in Video
1. INTRODUCTION
Text appearing in images and videos is usually categorized into two main groups according to
the source where it comes from: Artificial text (referred also as caption text or superimposed
text) and Scene text (referred also as graphics text). Artificial text is laid over the image at a
later stage (e.g. the name of someone during an interview, the name of the speaker, etc.). In
contrast to artificial text, scene text is part of the scene (e.g. the name of a product during a
commercial break, etc.) and is usually more difficult to extract due to different alignment, size,
font type, light condition, complex background and distortion. Moreover, it is often affected by
variations in scene and camera parameters, such as illumination, focus, motion, etc. The video
contents are available from TV broadcasting, internet and surveillance cameras. These videos
contain texts, including scrolling texts or caption text made by artificial overlaying after
recording and scene text embedded in backgrounds. Text embedded in images contains large
qualities of useful information. Since words have well-defined and unambiguous meanings, text
extracted from video clips can provide meaningful keywords which can reflect the rough
content of video. These keywords can be used for indexing and summarizing the content of
video clip [1].
2. Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.2, June 2011
104
Figure 1: Our proposed model
The process of extracting text from video consists of various stages as seen in the figure 1. The
system can take a real time video or a video from stored database as input. Each stage is
explained by presenting experimental results for a set of static images [1], [2].
1.1. Framing of Video
We have used the Java Media Framework (JMF) to capture the media content and frame the
video. JMF is a framework for handling streaming media in Java programs. JMF is an optional
package of Java 2 standard platform. JMF provides a unified architecture and messaging
protocol for managing the acquisition, processing and delivery of time-based media. JMF
enables Java programs to:
• Present (playback) multimedia contents.
• Capture audio through microphone and video through camera.
• Do real-time streaming of media over the Internet.
• Process media (such as changing media format, adding special effects).
• Store media into a file.
• JMF provides a platform-neutral framework for handling multimedia.
The input to this stage was a video containing text. The video was then framed into images
using JMF at the rate of 1 frame per second. This rate could be increased or decreased
depending on the speed of the video i.e. on the basis of fps (frames per second). The images
3. Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.2, June 2011
105
were scaled to a resolution of 280x90 and were saved at the specified location on the hard disk
drive.
For example: If the video was of 30 seconds, then 30 frames (images) of the video would be
scaled to a size of 28x90 and saved which were then given as an input to the next stage [7].
1.2. Pre Processing
A scaled image was the input which was then converted into a gray scaled image. This image
formed the first stage of the pre-processing part. This was carried out by considering the RGB
color contents(R: 11%, G: 56%, B: 33%) of each pixel of the image and converting them to
grayscale. The conversion of a colored image to a gray scaled image was done for easier
recognition of the text appearing in the images as after gray scaling, the image was converted to
a black and white image containing black text with a higher contrast on white background. As
the input video was framed into images at the rate of 1 frame per second in the first stage, the
input to this gray scaling stage was a scaled (280x90) colored image i.e. the frames of the video
[3], [6].
The second stage of pre-processing is lines removal. A video can contain noise which is either a
horizontal fluctuation (a horizontal line throughout the screen) or a vertical fluctuation (a
vertical line throughout the screen). Thus, for successful recognition of the text appearing in the
frames, there is a necessity that these horizontal and vertical fluctuations should be removed.
This was carried out by clearing all pixels (changing pixel color from black to white) located on
all lines which appeared horizontally and vertically across the screen because of the fluctuations
which may have occurred in the video. This stage did not make any changes to the image if the
video frame did not contain any horizontal and vertical fluctuations [6].
The third stage of pre-processing is discontinuities removals that were created in the second
stage of pre-processing. As explained above, if the video contained any fluctuations, then those
fluctuations were removed in the lines removal stage. If the horizontal and vertical fluctuations
occurred exactly where the text was present, then it created discontinuities between the text
appearing in the video frame which made the recognition of the text very difficult. This was
carried out by scanning each pixel from top left to bottom right and taking into consideration
each pixel and all its neighbouring pixels. If a pixel under consideration was white, and all the
neighbouring pixels were black, then that corresponding pixel was set as black because all the
black neighbouring pixels indicated that the pixel under consideration was cleared at the lines
removal stage because of the fluctuations [3].
The final output of pre-processing stage is wherein the remaining disturbances like noise are
eliminated. This was carried out again by scanning each pixel from top left to bottom right and
taking into consideration each pixel and all its neighbouring pixels. If a pixel under
consideration was black, and all the neighbouring pixels were white, then that corresponding
pixel was set as black because all the black neighbouring pixels indicated that the pixel under
consideration was some unwanted dot (noise) [2].
1.3. Detection and Localization
In the text detection stage, since there was no prior information on whether or not the input
image contains any text, the existence or non existence of text in the image must be determined.
However, in the case of video, the number of frames containing text is much smaller than the
number of frames without text. The text detection stage seeks to detect the presence of text in a
given image. Selected a frame containing text from shots elected by video framing, very low
threshold values were needed for scene change detection because the portion occupied by a text
region relative to the whole image was usually small. This approach is very sensitive to scene
4. Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.2, June 2011
106
change detection. This can be a simple and efficient solution for video indexing applications
that only need key words from video clips, rather than the entire text. The localization stage
included localizing the text in the image after detection. In other words, the text present in the
frame was tracked by identifying boxes or regions of similar pixel intensity values and returning
them to the next stage for further processing. This stage used Region Based Methods for text
localization. Region based methods use the properties of the color or gray scale in a text region
or their differences with the corresponding properties of the background. [2], [5].
1.4. Segmentation
After the text was localized, the text segmentation step deals with the separation of the text
pixels from the background pixels. The output of this step is a binary image where black text
characters appear on a white background. This stage included extraction of actual text regions
by dividing pixels with similar properties into contours or segments and discarding the
redundant portions of frame [2].
1.5. Recognition
This stage included actual recognition of extracted characters by combining various features
extracted in previous stages to give actual text with the help of a supervised neural network. In
this stage, the output of the segmentation stage is considered and the characters contained in the
image were compared with the pre-defined neural network training set and depending on the
value of the character appearing in the image, the character representing the closest training set
value was displayed as recognised character [2], [4].
2. LITERATURE SURVEY
The previous work which has been done on this concept has been basically done on images of
documents that have been scanned which are typically binary images or they can be easily
converted to binary images using simple binarization techniques i.e. converting the color image
into a grayscale image and then thresholding the grayscale image [1], [6].
These techniques used for text detection were developed for binary images only and thus these
techniques cannot be directly applied to color image or a video. The main problem is that
colored images and videos have rich color content, complex colored backgrounds which are
high in resolution and contains a lot of noise which makes it difficult to extract the text from
them using the techniques which were used earlier for a scanned image document which is in
binary form. The binarization is a conventional method used for extracting text from a image.
The color of the text in an image is lighter or darker as compared to its background and this is
used for text detection in binarization and so the text region in the image can be segmented by
thresholding. These methods are based on the assumption that text pixels have a different color
than background pixels, thus threshold is used to separate the text from the background. But the
noise and the disturbances contained in an image can also share the same color as of the text and
thus is becomes difficult to detect the region containing text [3], [4], [7].
Also, here are different language dependent characteristics such as: stroke density; font size;
aspect ratio; and stroke statistics affect the results of text segmentation and binarization
methods.
The previous text extraction algorithm for extracting text from a scanned image document
involved 3 basic steps as follows:
• Step1: Text Detection- The procedure of detection determines the boundary of
candidate text regions in a frame.
5. Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.2, June 2011
107
• Step 2: Text Extraction- Text extraction segments these regions and generates binary
images for recognition.
• Step 3: Text Recognition- Recognition was conducted with commercial OCR engines.
However, the traditional OCR technique can only handle binary text image with simple
background, while the images colored images which have complex background will
contain a lot of noise and disturbances. These disturbances will therefore influence
recognition accuracy heavily.
The analysis of the above mentioned technique is as follows:
2.1. Text Detection:
It was unpredictable for the text color to be darker or lighter than its background. Therefore,
simple binarization was not adaptive enough in text extraction.
2.2. Text Extraction:
There often exist many disturbances from background in a text region. They share similar
intensity with the text and consequently the binary image of the text region is unfit for
recognition directly.
2.3. Text Recognition:
The result of recognition was a ratio between the number of correctly extracted characters and
that of total characters and evaluates what percentage of a character were extracted correctly
from its background. For each extraction result of characters, if it did not miss the main strokes,
it was taken as a correct character. The extraction results were then sent to OCR engine directly.
A commercial OCR engine was utilized for recognition.
Another method was proposed for text extraction from a colored image with complex
background in which the main idea was to first identify potential text line segments from
horizontal scan lines. Text line segments were then expanded or merged with text line segments
from adjacent scan lines to form text blocks. False text blocks were filtered based on their
geometrical properties. The boundaries of the text blocks were then adjusted so that text pixels
lying outside the initial text region were included. Text pixels within text blocks were then
detected by using bi-color clustering and connected components analysis. Finally, non-text
artifacts within a text region were filtered based on heuristics [3], [5].
This method consisted of basically 6 steps:
• Step1: Identify Potential Text Line Segments
The grayscale luminance value of each pixel on the scanline was first computed from
the pixel’s R, G, and B color values. Each horizontal scanline was divided into several
segments based on the value of the maximum gradient difference computed at each
pixel location. The maximum gradient difference was computed as follows: first the
horizontal luminance gradient G!X of the scanline was computed by using the mask [-I,
I], then the difference between the maximum and minimum gradient values was
computed within a local window of size 1 x 21; we call this the maximum gradient
difference. Usually, text areas have larger maximum gradient difference value than
background areas. For each scanline segment with large maximum gradient difference
values, the following statistics were computed from the horizontal gradient waveform:
first the number of crossings was counted at a horizontal gradient level equal to a
negative threshold value (negativeEdgeCount), then the number of crossings was
counted at a horizontal gradient level equals to a positive threshold
(positiveEdgeCount), several consecutive negative crossings (without positive crossings
6. Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.2, June 2011
108
in between) were counted as one crossing; similarly for positive crossings. This was
because each text stroke required a pair of positive and negative crossings (in either
order) as end points. If negativeEdgeCount or positiveEdgeCount was less than some
threshold, the scanline segment was discarded. The mean and variance of the horizontal
distance between the crossings were then computed. For scanline segments containing
text, the mean and variance were considered within a certain range. If not, the scanline
segment was discarded. All remaining line segments after the filtering process were
potential line segments for text [3].
• Step 2: Potential Text Blocks Detection
In the second step, every potential text line segment was expanded or merged with
potential text line segments from adjacent scanlines to form a potential text block. For
each potential text line segment, the mean and variance of its grayscale values were
computed from the luminance image. This step of the algorithm was run in two passes.
In the first pass, the line segment (with the same endpoints) immediately below a
potential text line segment detected in step 1 was considered. If the mean and variance
of the luminance values of the current line segment was close to those of the potential
text line segment above it, it was merged with the line segment above it to form a
potential text block. This process was repeated for the line segment immediately below
the newly formed text block. In the second pass, the process was repeated in a bottom
up fashion. The reason for having both top-down and bottom-up passes was to ensure
that we do not miss any text blocks [3].
• Step 3: Text Blocks Filtering
The potential text blocks were then subject to a filtering process based on their area, and
their height and width ratio. If the computed values fell outside a pre-specified range,
they were discarded [3].
• Step 4: Boundary Adjustment
For each text block, it was necessary to adjust the boundaries to include text pixels that
were lying outside the boundaries. In doing so, first the average maximum gradient
difference of the text block was computed. Then the boundaries of the text block were
adjusted to include outside adjacent pixels that had maximum gradient difference that
were close to that of the computed average [3].
• Step 5: Bi-Color Clustering
The bi-color clustering was then applied to each detected text block. The color
histogram of the pixels within the text block was used to guide the selection of initial
foreground and background colors. Two peak values were picked that were of a certain
minimum distance apart in the color space as initial foreground and background colors
[3].
• Step 6: Filtering
In the filtering step, false text blocks and noise artifacts within a text block were
eliminated. First, the connected components of text pixels within a text block were
determined. Then the following filtering procedures were performed:
If text-region-height some threshold T1, and area of any connected
component text-region-areal2, the entire text block was discarded.
For every connected component, if its area text-region-heighU2, then the
connected component was regarded as noise and discarded.
7. Signal Image Processing : An International Journal (SIPIJ) Vol.2, No.2, June 2011
109
If a connected component touched one of the boundaries of the text block, and
if its size was larger than a certain threshold, it was discarded.
The output from this approach consisted of connected components of text character
pixels [3].
3. PLATFORM/TECHNOLOGY
3.1. Operating System
Windows XP/Vista/7
3.2. Language
Java J2SE:
Java Platform, Standard Edition or Java SE is a widely used platform for programming in the
Java language. It is the Java Platform used to deploy portable applications for general use. In
practical terms, Java SE consists of a virtual machine, which must be used to run Java programs,
together with a set of libraries (or packages) needed to allow the use of file systems, networks,
graphical interfaces, and so on, from within those programs.
4. APPLICATIONS AND FUTURE SCOPE
4.1. Applications
4.1.1. In the modern TV programs, there are more and more scrolling videotexts which can
provide important information (i.e. latest news occurred) in the TV programs. Thus this text can
be extracted.
4.1.2. Content based image retrieval involves searching for the required keywords in the images.
Based on the results the images are retrieved.
4.1.3. Extracting the number of a vehicle in textual form:
• It can be used for tracking of a vehicle.
• It can extract and recognize a number which is written in any format or font.
4.1.4. Extracting text from a presentation video:
• The time consumed for reading the text from a video can be reduced as the text
contained in the video can be retrieved without going through the complete video again
and again.
The memory space required will also be reduced.
4.2. Future Scope
We have implemented the project considering English language; it can be further extended to
other languages. If enlarged in future implementations, it will largely improve the efficiency of
the algorithm.
8. Signal Image Processing : An International Journal (SIPIJ) Vol.2, No.2, June 2011
110
5. EXPERIMENTS AND ANALYSIS
5.1 Input Image
Figure 2 shows the original image, which is given as the input to the system. The system’s job is
to extract the text from this image.
Figure 2: Original Image
5.2 Output Images
Figure 3 shows the processing of the original image at every phase and final extraction of the
text. For the input image as shown in Figure 2, the extracting text is “canvas”. Different phases
of operations include:
• Gray Scaled Image (Figure 3).
• Lines Removal (Figure 4).
• Dot Removal (Figure 5).
• Discontinuity Removal (Figure 6).
• Recognized word (Final extracted text) (Figure 7).
Figure 3: Gray Scaled Image
Figure 4: Lines Removal
Figure 5: Dot Removal
9. Signal Image Processing : An International Journal (SIPIJ) Vol.2, No.2, June 2011
111
Figure 6: Discontinuity Removal
Figure 7: Recognized Word
ACKNOWLEDGEMENT
A single person cannot work directly or indirectly, somebody needs to help you. Before we get
into thick of things we would like to add few heartfelt words for all the Teachers of Computer
Department who took the pains to make our project successful.
We would like to express sincere thanks to Prof. Jayshree Ghorpade for her guidance,
cooperation which helped us to successful completion of our project. We are also heartily
thankful to our friends for their kind cooperation who have directly or indirectly helped us to
complete the project successfully.
CONCLUSIONS
Extraction of text information involves detection, localization, tracking, extraction,
enhancement, and recognition of the text from a given image. A variety of approaches to text
information extraction from images and video have been proposed for specific applications
including page segmentation, address block location, license plate location, and content based
image/video indexing. In spite of such extensive studies, it is still not easy to design a general
purpose text information extraction system. This is because there are so many possible sources
of variation when extracting text from a shaded or textured background, from low contrast or
complex images, or from images having variations in font size, style, color, orientation, and
alignment. These variations make the problem of automatic text information extraction
extremely difficult. The problem of text information extraction needs to be defined more
precisely before proceeding further. A text information extraction system receives an input in
the form of a still image or a sequence of images. The images can be in gray scale or color,
compressed or uncompressed, and the text in the images may or may not move. The text
information extraction problem can be divided into the following sub problems:
• Detection
• Localization
• Tracking
• Extraction and Enhancement
• Recognition (NN)
Text detection, localization, and extraction are often used interchangeably in the literature. We
can conclude that the text appearing in a video can be extracted using this software with the
only restriction i.e. the video should not be very distorted. As, the characters are recognised on
run-time basis, there may be a few cases in which one or two characters may get misrecognised
i.e. a character may get recognised as some other character. But from the experiments
performed on the software, most of the text gets recognized successfully.
10. Signal Image Processing : An International Journal (SIPIJ) Vol.2, No.2, June 2011
112
REFERENCES
[1] Extracting Textual Information from Images and Videos for Automatic Content Based
Annotation and Retrieval by Julinda Gllavata.
[2] IEEE : An Edge-based Approach for Video Text Extraction,(This paper appears in : TENCON
2004. 2004 IEEE Region 10 Conference)
[3] IEEE: A Robust Algorithm for Text Extraction in Color Video'(This paper appears in:
Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on Issue Date:
2000)
[4] K. Jung, K.I. Kim, and A.K. Jain. Text Information Extraction in Images and Videos: A Survey.
Pattern Recognition Letters, 37:977–997, 2004.
[5] R. Lienhart and A. Wernicke. Localizing and Segmenting Text in Images and Videos.
Transactions on Circuits and Systems for Video Technology, 12(4):256–268, 2002.
[6] N. Efford. Digital Image Processing: a Practical Introduction Using Java. Addison Wesley, 2000.
[7] http://java.sun.com/dev/evangcentral/totallytech/jmf.html
Authors
Prof. Jayshree Ghorpade, M.E. [Computer],
Assistant Professor, Computer Dept., MITCOE,
Pune, India. 8+ years of experience in IT and
Academics Research. Area of interests are
Image Processing, Data Mining BioInformatics.
Raviraj P. Palvankar
Presently pursuing his Bachelor’s degree in
Computer Science in MIT College of Engineering,
Pune, India. He is going to complete his engineering
in May, 2011 and hopeful of career in Software Industry.
Ajinkya Patankar
A student of MIT College of Engineering, Pune,
India. Presently he is final year student in a Department
of Computer Engineering and going to complete his
Bachelor’s degree in May 2011.
Snehal Rathi
Presently pursuing his Bachelor’s degree in
Computer Science in MIT College of Engineering,
Pune, India. He is going to complete his engineering
in May, 2011 and hopeful of career in Software Industry.