50120130404055

307 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
307
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

50120130404055

  1. 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 556 TEXTUAL QUERY BASED SPORTS VIDEO RETRIEVAL BY EMBEDDED TEXT RECOGNITION Vilas Naik1 , Sagar Savalagi2 1 Department of CSE, Basaveshwar Engineering College, Bagalkot, India 2 Department of CSE, Basaveshwar Engineering College, Bagalkot, India ABSTRACT With growing popularity of sites like YouTube, video sharing and recording has obtained popularity in last several years. Unlike text documents, these multimedia contents are difficult to searched and index. Hence content based video retrieval systems are need of the hour. Content-Based Video Retrieval (CBVR) is an active research discipline focused on computational strategies to search for relevant videos based on multimodal content analysis in video such as visual, audio, text to represent and index video. In recent research on Content Based Video Retrieval has presented many such solutions based on these features. The textual content in the video in the form of embedded and scene text. They are quite helpful for indexing the videos. Proposed work is a content based video retrieval system based on textual ques. Text based video retrieval is an approach that enables search based on the textual information present in the video. Regions of textual information are identified within the frames of the video. Video is then annotated with the textual content present in the images. Then traditionally, OCRs are used to extract the text within the video. It also enables applications such as keyword based search in multimedia databases. With help of this video indexing and retrieval is done. A result shows that the system is quite efficient with an accuracy of around 90%. A textual query returns higher accuracy than visual queries which proves the concept. 1. INTRODUCTION With the development of various multimedia compression standards and significant increases in desktop computer performance and storage, the widespread exchange of multimedia information is becoming a reality. Video is arguably the most popular means of communication and entertain- ment. With this popularity comes an increase in the volume of video and an increase need for the ability to automatically sift through the search for relevant material stored in large video databases. Even with increase in hardware capabilities, which make video distribution possible, factors such as algorithms and speed and storage costs are concerns that must still be addressed. Considering this, a first step should be therefore an attempt to increase speed when using existing compression stan- INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), pp. 556-565 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  2. 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 557 dards. Performing analysis in the compressed domain reduces the amount of efforts involved in de- compression and providing a means of abstracting the data keeps the storage costs of the resulting feature set low. Both of these problems are active areas of research. The aim of this proposed work is to develop a new detection algorithm which has the ability of boosting the speed of search and in due reduces the cost of the storage. Every day, both military and civilian equipment generates giga-bytes of images. A huge amount of information is out there. However, it is impossible access or makes use of the information unless it is organized so as to allow efficient browsing, searching, and retrieval. Image retrieval has been a very active research area since the 1970s, with the thrust from two major research communities, database management and computer vision. These two research communities study image retrieval from different angles, one being text-based and the other visual-based. Many advances, such as data modelling, multidimensional indexing, and query evaluation, have been made along this research direction. There exist two major difficulties, especially when the size of image collection is large (tens or hundreds of thousands) and vast amount of labour requirement in manual image annotation. Other difficulty, which is more essential, results from the rich content in the im- ages and the subjectivity of human perception. That is, for the same image content different people may perceive it differently. The perception subjectivity and annotation impreciseness may cause un- recoverable mismatches in later retrieval processes. The proposed mechanism is unique scheme in the direction of alleviating these hurdles with a new detection algorithm with boosting that offer a retrieving system which is based on text. The work is folded in following steps: Initially frames are collected from video clip. From these frames text part is segmented. Further, character segmentation identifies the characters. These characters are recognized by the character recognition process carried by Optical Character Recognition (OCR). In order to increase the accuracy of identification Color features are additionally extracted from video clip. These color features are combined with text features and are stored in the database. When user feeds text query it will be matched against stored characters and displays matching videos. 2. RELATED WORK The video retrieval is important in multimedia search engine related applications. Recogniz- ing the text is a crucial task in such applications. In last decade’s most of the researchers proposed different methods for video retrieval some of the related work are summarized in the following. An approach that enables search based on the textual information present in the video is in- troduced in [1]. In this method a Regions of textual information are identified within the frames of the video. Video is then annotated with the textual content present in the images. An approach that enables matching at the image-level and thereby avoiding an OCR is also addressed. Videos contain- ing the query string are retrieved from a video database and sorted based on the relevance. Results are shown from video collections in English, Hindi and Telugu. In [2] a method to automatically localize captions in JPEG compressed images and the I-frames of MPEG compressed videos is pro- posed. In this method a Caption text regions are segmented from background images using their dis- tinguishing texture characteristics. Unlike previously published methods which fully decompress the video sequence before extracting the text regions, this method locates candidate caption text regions directly in the DCT compressed domain using the intensity variation information encoded in the DCT domain. Therefore, only a very small amount of decoding is required. A method in [3] is a news video retrieval solution that target specific news videos based on their contents described by overlay text is addressed. This approach is based on use of overlay text that conveys direct meaning of video as a source of complementary information. The whole process is divided in to two steps. Firstly, they build the “metadata labels” by detecting and extracting the overlay text. Secondly, these labels are then used to index the news videos. The experiments are carried on the news videos from NDTV News and large data set of video images containing artificial text developed at Image
  3. 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 558 Processing Centre (IPC) a research facility at National University of Sciences and Technology (NUST), Pakistan. FFMPEG Library is used to extract the frames form news videos. Overlay scene is also inserted on the video scene like the overlay text is, the transition region is also observed at. In [4] the authors proposed three main factors, 1. The integration of the image and audio analysis re- sults in identifying news segments. 2. The video OCR technology to detect text from frames, which provides a good source of textual information for story classification when transcripts and close cap- tions are not available. 3. Natural language processing (NLP) technologies which are used to per- form automated categorization of news stories based on the texts obtained from close caption or vid- eo OCR process. Based on these video structure and content analysis technologies, two advanced video browsers are developed for home users: intelligent highlight player and HTML-based video browser. Author has proposed a annotation-based indexing method which allows user to retrieve video using textual annotations in [5]. This takes a text based query and compares it with tags used for the indexing the event based video is retrieved from cricket video database. Experiment shows that annotation based event retrieval based methods can potentially improve retrieval accuracy using different searching techniques like binary search or indexing when database is very large and hereby the video retrieval can be efficiently carried out with this type of retrieval system. A technique has been proposed to address problems regarding extracting text from a video and to design algorithms for each phase of extracting text from a video using java libraries and classes. In this first the input video is framed into stream of images using the Java Media Framework (JMF) with the input being a real time or a video from the database. Then pre processing algorithms are applied to convert the image to gray scale and remove the disturbances like superimposed lines over the text, discontinuity removal, and dot removal then continue with the algorithms for localization, segmentation and rec- ognition for which uses the neural network pattern matching technique. The performance of an ap- proach is demonstrated by presenting experimental results for a set of static images. Improving Mul- timedia Retrieval with a Video OCR a set of experiments with a video OCR system (VOCR) tailored for video information retrieval and establishes its importance in multimedia search in general and for some specific queries in particular. By the method in [7] analysis of video frames producing candi- date text regions is detailed. The text regions are then binaries and sent to a commercial OCR result- ing in ASCII text that is finally used to create search indexes. The system is evaluated using the TRECVID data. The effectiveness of various textual sources is evaluated on multimedia retrieval by combining the VOCR outputs with automatic speech recognition (ASR) transcripts. For general search queries, the VOCR system coupled with ASR sources outperforms the other system by a very large extent. For search queries that involve named entities, especially people names, the VOCR sys- tem even outperforms speech transcripts, demonstrating that source selection for particular query types is extremely essential. Another important consideration is the quality and complexity of pictures containing text for evaluation. Some methods consider large fonts in images, advertisements and video clips . The me- thods also have some limitations as method in [8] does not detect low contrast text and small fonts. The techniques in [9] use text with deferent complex motions. The method in [10] as well as in [11] detect only caption text in news video clips. The work proposed extracts text from video frames by separating text region from back- ground and employs conventional OCR for text recognition. 3. PROPOSED ALGORITHM FOR VIDEO RETRIEVAL In this section, overview and detail description of all the blocks of the proposed system is given.
  4. 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July 3.1 Overview of the Approach The proposed mechanism is unique scheme that offers a video retrieval system which is based on embedded text the method uses the information conveyed to embedded text to recognize the video to be retrieved from collection based on text query .the mechanism matches que presented in video frame based on feature explained . First extract frames from video. Text part is segmented. Character segmentation extracts the characters. Character recognition recognizes the characters. Color features from video scene are are stored in the database. User can input either text query. If query is in text form, then that is matched against stored characters and displays matched videos. Figure 1. Fig. 1 Proposed algorithm for Video retrieval by aText Query 3.2 The Text Query Based Video Retrieval Algorithm. This proposed algorithm is summarized into following steps. Step 1. Input a video and Convert it Step 2.Apply Median Filter to each frame and perform sobel Edge Detection for detecting an text region edge from the frame then Calculate Sumgraph. i.e. Adding rows and column of binay image. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 559 mechanism is unique scheme that offers a video retrieval system which is bedded text the method uses the information conveyed to embedded text to recognize the video to be retrieved from collection based on text query .the mechanism matches que presented in video frame based on feature explained . First extract frames from video. Text part is segmented. Character segmentation extracts the characters. Character recognition recognizes the features from video scene are extracted. Color features combined with text features are stored in the database. User can input either text query. If query is in text form, then that is matched against stored characters and displays matched videos. The over all flow is as in the Proposed algorithm for Video retrieval by aText Query The Text Query Based Video Retrieval Algorithm. rithm is summarized into following steps. . Input a video and Convert it in to frames. .Apply Median Filter to each frame and perform sobel Edge Detection for detecting an text region edge from the frame then Calculate Sumgraph. i.e. Adding rows and column International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- August (2013), © IAEME mechanism is unique scheme that offers a video retrieval system which is bedded text the method uses the information conveyed to embedded text to recognize the video to be retrieved from collection based on text query .the mechanism matches query the text presented in video frame based on feature explained . First extract frames from video. Text part is segmented. Character segmentation extracts the characters. Character recognition recognizes the features combined with text features are stored in the database. User can input either text query. If query is in text form, then that is The over all flow is as in the .Apply Median Filter to each frame and perform sobel Edge Detection for detecting an text region edge from the frame then Calculate Sumgraph. i.e. Adding rows and column
  5. 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 560 Step 3.Text region segmentation is performed by applying Threshold as Threshold = (sum(sum(B'))/prod(size(sum(B')))*50 + max(max(sum(B')))*30)/100 Where B`= input image. Step 4. Apply OCR to recognize the text characters from frames and color feature are stored in database as text features. Normalize characters to size 32x32. Step 5. Given a text query, extract characters. Match with character set associated with videos in one direction. Calculate total character match with respect to each video. Step 6. Retrieve the videos with highest matches. 3.3 Text region localization As a first step, extract frames from that are taken from video collection on individual bases. Convert an video frame into image because an video frame will be compressed format so when it processes the frame it will be an image, then convert it into greyscale image as show. Now apply an Median filter to an image the output of median filter is shown in fig 4.2. The median filter considers each pixel in the image in turn and looks at its nearby neighbours to decide whether or not it is repre- sentative of its surroundings. Instead of simply replacing the pixel value with the mean of neighbour- ing pixel values, it replaces it with the median of those values. The median is calculated by first sort- ing all the pixel values from the surrounding neighborhood into numerical order and then replacing the pixel being considered with the middle pixel value. Now an sobel operator is used, Its an edge detection algorithm technique which is applied to an greyscale image that detects an text region edge from an greyscale image. 3.3 Text detection and Segmentation After the text region is localized. Text area is to be segmented for further reorganization the output of this step is a binary image where black text characters appear on a white background. This stage included extraction of actual text regions as follows. Here again a median filter to an edge de- tected image that will give us a smooth image now take the vertical and horizontal histogram. The horizontal and vertical histogram, this represents the column-wise and row-wise histogram respec- tively. These histograms represent the sum of differences of gray values between neighbouring pix- els of an image, column-wise and row-wise. In the above step, first the horizontal correction is cal- culated. To find a horizontal correction, the algorithm traverses through each column of an image. In each column, the algorithm starts with the second pixel from the top. The difference between second and first pixel is calculated. If the difference exceeds certain threshold, it is added to total sum of differences. Then, algorithm will move downwards to calculate the difference between the third and second pixels. So on, it moves until the end of a column and calculate the total sum of differences between neighboring pixels. At the end, an array containing the column-wise sum is created. The same process is carried out to find the vertical correction. In this case, rows are processed instead of columns .Then calculate an threshold value with normalize sum as shown below. Threshold= (sum(sum(B'))/prod(size(sum(B')))*50+max(max(sum(B')))*30)/100; Where B`= input image. The rows and column which satisfies the threshold value then those column are considered. And this will gives us the rows and column where an text is appeared, then extraction of an text block as shown in figure.2 (d) and storing that image into an result folder. Extract all regions sepa- rately. Perform Sum graph. Extract Maxima to extract the characters and Normalize characters to size 32x32.
  6. 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July (a) (c) Fig. 2 Overview of text detection and segmentation (a) original frame. (b) gray scale image with noise reduction and edge detection.(c ) feature 3.4 Text Reorganization with Optical character reorganization (OCR) This stage includes actual recognition of extracted characters by combining various fe extracted in previous stages to give actual text. The output of the segmentation stage is co and given as a input to this stage. Here an put image and recognizes character’s. An undergoes above 4 stage processing they are processing. In above four stages an important stage is an feature extraction, On basis of feature e traction an OCR ia possible to recognize. We have is one of the simplest approaches to patter recognition. Template matching: This process involves the use of a database of characters or templates. There exists a template for all possible input ter is compared to each template to find either an exact match, or the template with the closest r presentation of the input character. If I(x, y) is the input character, Tn(x, y) is the templ the matching function s(I, Tn) will return a value indicating how well template n matches the input character. The generated outputs from the OCR are for future indexing and retrieval. In Fig rated out from the rest of the image and binarized. When this detected block is given as input to the OCR, the corresponding ASCII output is shown in Fig extraction part system detects the text blocks accurately even in a complex background, the OCR also recognize 90% text correctly. As seen in Fig to the presence of noise. Extract mean, standard deviation of feature extracted is also store in with text database as text feature. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 561 (a) (b) (c) (d) Overview of text detection and segmentation (a) original frame. (b) gray scale image tion and edge detection.(c ) feature vector graph when text detected in frame (d) detected text Text Reorganization with Optical character reorganization (OCR) This stage includes actual recognition of extracted characters by combining various fe ive actual text. The output of the segmentation stage is co and given as a input to this stage. Here an Optical Character recognition (OCR) is used takes an i put image and recognizes character’s. An When a text image is given input to OCR then a i undergoes above 4 stage processing they are Pre-processing, Feature Extraction, Classification, Post . In above four stages an important stage is an feature extraction, On basis of feature e traction an OCR ia possible to recognize. We have used an template matching feature extraction, this proaches to patter recognition. This process involves the use of a database of characters or templates. There ists a template for all possible input characters. For recognition to occur, the current i ter is compared to each template to find either an exact match, or the template with the closest r presentation of the input character. If I(x, y) is the input character, Tn(x, y) is the templ tion s(I, Tn) will return a value indicating how well template n matches the input ated outputs from the OCR are ASCII characters, which are used as keywords and retrieval. In Figure. 3 (a) shows an identified as a text block. This it is sep rated out from the rest of the image and binarized. When this detected block is given as input to the SCII output is shown in Figure. 3.(c). It is observed that while the text extraction part system detects the text blocks accurately even in a complex background, the OCR t correctly. As seen in Figure. 3 (d), the some word was miss recognized due Extract mean, standard deviation of R,G,B components of frames, feature extracted is also store in with text database as text feature. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- August (2013), © IAEME Overview of text detection and segmentation (a) original frame. (b) gray scale image vector graph when text detected in frame This stage includes actual recognition of extracted characters by combining various features ive actual text. The output of the segmentation stage is considered Optical Character recognition (OCR) is used takes an in- When a text image is given input to OCR then a image processing, Feature Extraction, Classification, Post- . In above four stages an important stage is an feature extraction, On basis of feature ex- ing feature extraction, this This process involves the use of a database of characters or templates. There characters. For recognition to occur, the current input charac- ter is compared to each template to find either an exact match, or the template with the closest re- presentation of the input character. If I(x, y) is the input character, Tn(x, y) is the template n, then tion s(I, Tn) will return a value indicating how well template n matches the input ASCII characters, which are used as keywords fied as a text block. This it is sepa- rated out from the rest of the image and binarized. When this detected block is given as input to the (c). It is observed that while the text extraction part system detects the text blocks accurately even in a complex background, the OCR (d), the some word was miss recognized due R,G,B components of frames, color
  7. 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July (a) (c) Fig 3 (a) Frame contaning text. (b)Original frame (c) Text extraction by done using OCR 3.4 Text querying A text query which is entered by an text is extracted and recognized and sent to an matching process which is next stage as shown in fig 3.3. In that database an individual video has its own character set which is reco the matching process which has an direct access to database as shown in fig 3.3. The video character set associated with a videos which are stored in database with an mean deviation, at first level while frame extract racter with an of character set that takes place in one direction. character ‘C’ followed by ‘R’, like this it matches character form query to character from video text dataset. Then Calculate total character matches with respect to each video and Display the videos names with highest matches result as shown in fig Query Fig 4. Block di Matching process Query text reorganization Videos names International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 562 (a) (b) (c) (d) (a) Frame contaning text. (b)Original frame (c) Text extraction by done using OCR. (d) text recognization by OCR A text query which is entered by an user is processed as shown in figure 4. in which an query text is extracted and recognized and sent to an matching process which is next stage as shown in fig 3.3. In that database an individual video has its own character set which is recognized by an OC the matching process which has an direct access to database as shown in fig 3.3. The video character set associated with a videos which are stored in database with an color feature extracted with std iation, at first level while frame extraction. The process will start matching an query ch racter with an of character set that takes place in one direction. The matching process will match an lowed by ‘R’, like this it matches character form query to character from video text late total character matches with respect to each video and Display the videos result as shown in figure 5. Block diagram of query processing Database Recognized text from video International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- August (2013), © IAEME (a) Frame contaning text. (b)Original frame (c) Text extraction by done using . in which an query text is extracted and recognized and sent to an matching process which is next stage as shown in fig nized by an OCR. In the matching process which has an direct access to database as shown in fig 3.3. The video character feature extracted with std ion. The process will start matching an query cha- The matching process will match an lowed by ‘R’, like this it matches character form query to character from video text late total character matches with respect to each video and Display the videos Database Recognized text from video
  8. 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 563 Fig 5. Result of query 4. EXPERIMENTAL RESULTS AND DISCUSSIONS In this section, it presents quantitative results on the performance of the text extraction sys- tem. The performance can be measured in terms of true positives (TP) - text regions identified cor- rectly as text regions, false positives (FP) non-text regions identified as text regions and false nega- tives (FN) - text regions missed by the system. Using these basic definitions, recall and precision of retrieval can be defined as follows: Recall = TP/(TP+FN) and Precision = TP/(TP+FP) While the above definitions are generic, different researchers use different units of text for calculating recall and precision. Wong and Chen consider the number of characters while some of the other authors count the number of text boxes or text regions. Jain and Yu calculate recall and precision by considering either characters or blocks depending on the type of image. It has adopted the second definition in which it consider the text regions as units for counting. The ground-truth is obtained by manually marking the correct text regions. Having calculated recall and precision on a large number of text-rich images. For video processing, testing the system on different types of mpeg videos such as news clips, sports clips and commercials. The videos contain both caption texts as well as scene texts of different font, color and intensity. Table 1 shows the performance of our pro- posed method on four types of video. It is seen that our method has an overall average recall of 82% and precision of 87%. The method is able to detect text under a large number of different conditions like text with small fonts, low intensity, deferent color and cluttered background, text from noisy video, News caption with horizontal scrolling and both caption text and scene. Table 1 Recall and precision of text block extraction No. of text blocks TP FP FN Recall % Precession% SPORTS VIDEO 780 624 60 24 80% 92% Where TP= True positive, FP= False positive, FN= False negative
  9. 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 564 Table 2 Execution time for retrieval Videos with different back- ground Text extraction OCR Retrieval Total Time in sec Complex 57 sec for 100 frames 20 sec 1.55 sec 1:08:55 sec Plain 23.78 sec for 60 frames 10 sec 1.20 sec 00:34:98 sec The primary advantage of the proposed method is that it is very fast since most of the compu- tationally intensive algorithms are applied only on the regions of interests. Table 2 shows processing time for different types of video clips using a 1.83 GHZ Intel’s core 2 duo machine. As show com- parative time required by the algorithms including retrieval is 1:08:55 sec for complex background and for simple it is nearly half a sec. An average is taken over a number of different image sizes.. Since by process every frame which occurs at the rate of about 5.6 per second, and OCR takes about 20 sec for complex background and 10 sec for simple’s per retrieval concern it is with an 1:55 sec. So it is seen that algorithm requires the least time for processing each frame and Retrieval. 5. CONCLUSION The proposed work uses a textual contents to present a comprehensive video i.e used as con- tent for retrieval system that is based on extracting text from video, recognition of text from image and then matching text from database with query text. Beside this matching, system performs a matching based on color features, such that irrelevant videos are not extracted. The proposed work uses Median filter and soble operator for text region localization, an histogram for text segmentation and on OCR is used for recognition embedded text from sports video. Result shows significant effi- ciency in detection with a 80 % recall and 92% precession for an text region. Time taken for a re- trieval for complex background will be 1.55 sec and for simple background will be an 1.20 sec Sys- tem can be further improved by implementing better OCR technique for 100% accuracy in text rec- ognition from videos. That will significantly improve the quality of the process. REFERENCES [1]. C. V. Jawahar, Balakrishna Chennupati, Balamanohar Paluri, Nataraj Jammalamadaka,2006 “Video Retrieval Based on Textual Queries” [2]. Yu Zhong, Hongjiang Zhang, and Anil K. Jain, April 2000. “Automatic Caption Localization in Compressed Video” IEEE transactions on pattern analysis and machine intelligence [3]. Nilesh Bhojne, Pravinkumar Kamde and Dr. S. P. Algur , 2012 “News Video Indexing and Retrieval using Overlay Text”. [4]. Wei Qi, Lie Gu, Hao Jiang, Xiang-Rong Chen and Hong-Jiang Zhang, 1998 “Integrating Vis- ual, Audio and Text analysis for news video”. [5]. Shi-Yong Neo, Jin Zhao, Min-Yen Kan, and Tat-Seng Chua, 1998 “Video Retrieval using High Level Features: Exploiting Query Matching and Confidence-based Weighting”. [6]. Pranali Kosamkar, Vikram Wathodkar,Rajendra Shinde , April 2012 “Annotation Based Event Retrieval in Cricket Video”, International Journal of Advances in Computing and In- formation Researches [7]. Jayshree Ghorpade, Raviraj Palvankar, Ajinkya Patankar and Snehal Rathi, June 2011 “Ex- tracting Text From Video” Signal & Image Processing An International Journal (SIPIJ).
  10. 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 565 [8] D. Xu and Shih-Fu Chang, 2007 “Visual Event Recognition in News Video using Kernel Me- thods with Multi-Level Temporal Alignment”, IEEE Conference. on Computer Vision and Pattern Recognition. [9] H-K. Kim, , Dec 1996 “Efficient Automatic Text Location Method and Content-Based Index- ing and Structuring of Video Database”. Journal of Visual Communication and Image Repre- sentation, [10] H. Li, D. Doerman and O. Kia, Jan. 2000 “Automatic Text Detection and Tracking in Digital Video” IEEE Transactions on Image Processing. [11] T. Sato, T. Kanade, E. Hughes and M. Smith, 1999 “Video OCR Indexing Digital News Li- braries by Recognition of Superimposed Captions”. Multimedia Systems, Vol. 7,pp. 385-394. [12] Vilas Naik, Prasanna Patil and Vishwanath Chikaraddi, “Action Event Retrieval from Cricket Video using Audio Energy Feature for Event Summarization”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 4, 2013, pp. 267 - 274, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [13] Vilas Naik, Vishwanath Chikaraddi and Prasanna Patil, “Query Clip Genre Recognition using Tree Pruning Technique for Video Retrieval”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 4, 2013, pp. 257 - 266, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [14] Vilas Naik and Raghavendra Havin, “Entropy Features Trained Support Vector Machine Based Logo Detection Method for Replay Detection and Extraction from Sports Videos”, International Journal of Graphics and Multimedia (IJGM), Volume 4, Issue 1, 2013, pp. 20 - 30, ISSN Print: 0976 – 6448, ISSN Online: 0976 –6456.

×