SlideShare a Scribd company logo
1 of 6
Download to read offline
VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 3 (2020)
ISSN(Online): 2581-7280 Article No. 9
PP 1-6
1
www.viva-technology.org/New/IJRI
Feature Extraction and Feature Selection using Textual Analysis
Hemlata Badwal1
, Prof. Chandani Patel2
1
(Department of Computer Applications, VIVA School of MCA, Virar, Maharashtra, India)
Email: badwalhemlata@gmail.com
2
(Department of Computer Applications, VIVA School of MCA, Virar, Maharashtra, India)
Email: chandaniapatel@gmail.com
Abstract: After pre-processing the images in character recognition systems, the images are segmented based on
certain characteristics known as “features”. The feature space identified for character recognition is however
ranging across a huge dimensionality. To solve this problem of dimensionality, the feature selection and feature
extraction methods are used. Hereby in this paper, we are going to discuss, the different techniques for feature
extraction and feature selection and how these techniques are used to reduce the dimensionality of feature space
to improve the performance of text categorization.
Keywords – Character Recognition, Feature Extraction, Feature Selection, Image Segmentation, Pre-
processing.
1. INTRODUCTION
The conversion of textual documents in digital form have increased rapidly worldwide. This is where the
text classification becomes a dire need to handle these documents. The character recognition systems are thus
used to fulfill these needs. The main objective of character recognition systems is to recognize and classify the
text on the basis of predefined categories using some classifiers. Several analysis works have been done to evolve
newer techniques and strategies that would scale back the time interval for processing whereas providing higher
recognition accuracy.
In the following section, the four major steps used for handwriting recognition systems are described: -
Pre-processing, Image segmentation, Feature extraction and Classification. Section II describes the various
methods available for pre-processing, feature selection and feature extraction and the classifiers [1]. Section III
presents the conclusive idea regarding the reviewed methods.
2. WORKING PRINCIPLE
Handwritten character recognition is the ability of an automated system to recognize handwritten
character or sentence from photographs, documents, touchscreens and other such devices by a computer [1]. The
image of the written sentence / word / character may be gathered either offline or on-line. In off line technique it
may be from a scanned image of a paper. In on-line the sleuthing motion of the pen tip, for example by a pen-
based display screen surface capturing temporal frequency [2].
VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 3 (2020)
ISSN(Online): 2581-7280 Article No. 9
PP 1-6
2
www.viva-technology.org/New/IJRI
Fig. 1 Block Diagram of Character Recognition
Fig. 1 describes the process of text classification. Although, text classification has one limitation. This
limitation includes high spatial property of feature area thanks to terribly sizable amount of options. The larger
the options, the bigger the complexity of methods used for text classification and also the smaller the accuracy
due to irrelevant or redundant terms within the feature space. Feature extraction and feature selection methods are
used to eradicate this downside.
The initial major step in textual classification is Pre-processing. The moot data existing in the document
is eliminated using the pre-processing. The subsequent step, image segmentation decomposes an image of a
sequence of characters into sub images of individual symbols. Then, each character image is resized into m x n
pixels towards the training network. These sub images are used to select the subset from the original feature set
on the basis of importance of features, known as Feature Extraction. The fundamental task of feature extraction
and selection is to identify a group of the most effective features for classification; so as to compress the high-
dimensional feature space to low-dimensional feature space, for efficient classification [4] [16].
2.1 PRE-PROCESSING
When a document is scanned it requires some preliminary processing. This pre-processing helps to
eliminate the irrelevant data in the scanned image in the form of noise and resize the image. For pre-processing
the document, noise reduction, binarization, skew correction, etc. techniques can be used. The main objectives of
pre-processing are [1]:
1. Noise reduction
2. Binarization
3. Skew correction
4. Stroke width normalization
Noise Reduction
The noise removal is achieved by converting RGB image into grayscale. It is then converted into binary
image consisting only pixels 0’s (white) and 1’s (black). Unnecessary background pixels are removed from
original image. To remove the redundant or irrelevant data, the threshold value is applied to the image.
Fig. 2 Grayscale image with noise
Fig. 3 Reduced noise image
VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 3 (2020)
ISSN(Online): 2581-7280 Article No. 9
PP 1-6
3
www.viva-technology.org/New/IJRI
Stroke-width normalization
After normalization generally it reduces the amount of data to be processed. For e.g. by thinning the
shape information of a character can be gathered without losing the data.
Fig. 4 Stroke-width normalization image
Skew Correction and Slant Removal
Skew correction methods are used for the alignment of the coordinate system of the scanner with respect
to that of the document. Its main approaches include correlation, projection profiles and Hough transform etc [1].
The slant of any handwritten text(s) varies from one user to another. The characters can be normalized by using
the slant removal methods [1].
Fig. 5 Skew correction and Slant removal image
Binarization
In Binarization, a gray scale image if transformed to a binary image using global thresholding technique
[3]. Let’s assume f(x,y) is an input image. T is the threshold value and g(x,y) is the output image of thresholding
process then the mathematical equation of this conversion is g(x,y) =1 if f(x,y) ≥ T otherwise 0.
Fig. 6 Binarized image
2.2 IMAGE SEGMENTATION
Image Segmentation involves two major steps [4]:
1. Line Detection
2. Word Detection
Line detection
Lines are differentiated using the Hough Transform, horizontal projections, etc. technique.
Hough transform detects straight lines. The straight line y = mx + b can be represented as a point (b, m)
in the parameter space, where b represents the intercept parameter and m represents the slope parameter.
Horizontal Projection profile is the projection profile of an image along horizontal axis. It calculates each
row for all column pixels in that row.
Fig. 7 Line Segmentation
VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 3 (2020)
ISSN(Online): 2581-7280 Article No. 9
PP 1-6
4
www.viva-technology.org/New/IJRI
Word detection
Words can be identified using vertical projections, connected component analysis, etc.
Vertical Projection profile is the projection profile of an image along vertical axis. Vertical projection
profile is calculated for each column as summation of all row pixel values inside the column.
Connected component labeling works by scanning an image, pixel-by-pixel (from top to bottom and left
to right) so as to identify connected pixel regions, [19] i.e. regions of adjacent pixels that share the same set of
intensity values V. For a binary image V={1} [20].
Fig. 8 Word Segmentation
2.3 FEATURE EXTRACTION
Feature extraction methods include principal component analysis (PCA), latent semantic indexing (LSI),
etc [3].
Principal Component Analysis (PCA): Principal Component Analysis is dimension reduction technique
that extracts the information from various dataset. PCA produces a set of lower-dimensional features from the
original dataset. PCA is sensitive to the relative scaling of the original variables [5]. PCA determines the number
of principal components using certain criterions. The dimensions representing the principal components at the
best are chosen. The number of principal components to be chosen varies depending upon the quality of the
dataset.
Latent Semantic Indexing (LSI): Latent Semantic Indexing (LSI) is a technique that projects queries and
documents into a space with “latent” semantic dimensions. LSI has fewer dimensions than the original space. It
works as a similarity matrix that is an alternative to word overlap measures like td.idf. LSI creates a matrix of
words and sentences in a document that determine the occurrences of similar words which can be used for
classification.
2.4 CLASSIFICATION
Classification is a supervised machine learning technique which identifies the category that a new
observation belongs to, based on the observations used in the training set. The classifier utilization depends on
several factors, such as available training set, range of free parameters. Some of the vital techniques which can be
used for classification are- decision Tree (DT), k-Nearest Neighbor (k-NN), Bayes Classifier, Neural Networks
(NN), Support Vector Machines (SVM), etc [1].
Decision Tree (DT) Classifier: In Decision tree classifier, the features are split into completely different
regions corresponding to the classes available. It uses Information Gain value to identify the node at each level of
the DT. The feature with the highest Information Gain is chosen as the node. The class is assigned along the path
of the assigned nodes containing the features.
k-Nearest Neighbor (k-NN) Classifier: The purpose of k-NN classifier is used to assign the class label to
a cluster of similar elements. It calculates the nearest neighbors using the Euclidean distance matrix. The similarity
score is used as the weight of the classes of the neighbor document.
Naive Bayes (MB) Classifier: Naive Bayes Classifier is a probabilistic classifier. It assumes the
independence of features. The value of a particular feature is independent of any other feature’s value, given the
class variable [17]. For example, a fruit can be considered to be an apple if it is round, red, and about 10 cm in
diameter. A naive Bayes classifier considers each and every feature contribute independently to the probability
that this fruit is an apple, regardless of any attainable correlations between the color, roundness, and diameter
features. Naive Bayes Classifier has surprisingly performed quite well in many complex real-world issues albeit
it considers naive assumptions. Training and testing phase in Naive Bayes is easy for implementation and
computation.
VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 3 (2020)
ISSN(Online): 2581-7280 Article No. 9
PP 1-6
5
www.viva-technology.org/New/IJRI
Support Vector Machine (SVM) Classifier: Support Vector Machine belongs to supervised learning
classification technique. SVM algorithm consists of 2 types of versions: non-linear and linear versions [18]. In
non-linear version, classes are not separated i.e. no straight lines that separate the classes can be found. In linear
version, the classes are separated with the hyperplanes. Linear classifier is defined as wT x + b = 0. Where, w is
the direction of the hyperplane and b is the precise position of hyperplane [18].
Fig. 9 Linear SVM Classifier
SVM find this hyperplane by using margins and support vectors. The region between the two hyperplanes
wT x + b = 1 and wTx + b = -1 is known as margin. 2/||w|| is the width of the margin. The tuples that fall on the
two hyperplanes are known as support vectors.
3. CONCLUSION
With the rapid increase in the digital conversion of texts, the need for handwriting recognition systems
are soaring higher and higher. Here in this paper, we have discussed the techniques used in converting the text
into the digital form. The accuracy rate of the systems determines how well a system is. But the text classification
suffers from the higher dimensionality problem. To reduce the feature space and increase the accuracy of the
systems, we have reviewed the pre-processing techniques, image segmentation, feature extraction as well as the
classifiers available for text recognition.
REFERENCES
[1] Vijay Prasad and Yumnam Jayanta Singh, "A study on method of feature extraction for Handwritten Character Recognition",
Indian Journal of Science and Technology, pp. 174-178, March 2013.
[2] Ayush Purohit and Shardul Singh Chauhan, "A Literature Survey on Handwritten Character Recognition", (IJCSIT) International
Journal of Computer Science and Information Technologies, Vol. 7 (1), 2016.
[3] J. Pradeep, E. Srinivasan and S. Himavathi, "Diagonal based feature extraction for Handwritten alphabets recognition system using
neural network", International Journal of Computer Science & Information Technology, Vol 3, No 1, Feb 2011.
[4] Foram P. Shah and Vibha Patel, "A Review on Feature Selection and Feature Extraction for Text Classification", International
Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), March 2016.
[5] Gaurav Kumar and Pradeep Kumar Bhatia, "A Detailed Review of Feature Extraction in Image Processing Systems", Fourth
International Conference on Advanced Computing & Communication Technologies, Feb 2014.
[6] T. Fujisaki, T.E. Chefalas, J. Kim, C.C. Tappert and C.G. Wolf, “On-Line Run-On Character Recognizer: Design and
Performance”, Character and Handwriting Recognition: Expanding Frontiers. P.S.P. Wang, ed., pp. 123-137, Singapore: World
Scientific, 1991.
VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 3 (2020)
ISSN(Online): 2581-7280 Article No. 9
PP 1-6
6
www.viva-technology.org/New/IJRI
[7] Oivind Due Trier, Torfinn Taxt, Anil K. Jain, “Feature Extraction Methods for Character Recognition-A survey”, Pattern
Recognition, Vol. 29, pp. 641-662, April 1996.
[8] K. Gaurav and Bhatia P. K., “Analytical Review of Preprocessing Techniques for Offline Handwritten Character Recognition”,
2nd International Conference on Emerging Trends in Engineering & Management, ICETEM, 2013.
[9] A. Brakensiek, J. Rottland, G. Rigoll, A. Kosmala, “Offline Handwriting Recognition using various Hybrid Modeling Techniques
and Character N-Grams”, Available at http://irs.ub.rug.nl/dbi/4357a84695495.
[10] R.G. Casey and E. Lecolinet, “A Survey of Methods and Strategies in Character Segmentation,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, Vol. 18, No.7, pp. 690-706, July 1996.
[11] S. N. Srihari, R. Plamondon, “On-line and off- line handwritten character recognition: A comprehensive survey,” IEEE.
Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 1, pp. 63-84, 2000.
[12] Harun Uguz, “A two-stage feature selection method for text categorization by using information gain, principal component analysis
and genetic algorithm”, Elsevier Knowledge-Based Systems, pp. 1024-1032, 2012.
[13] S. Niharika, V. Sneha Latha and D.R. Lavanya, “A Survey on Text Categorization”, International Journal of Computer Trends
and Technology, Vol. 3, pp. 39-45, 2012.
[14] M. Blumenstein, H. Basli, B. Verma, “A Novel Feature Extraction Technique for the Recognition of Segmented Handwritten
Characters”, Proceedings of the 7th International Conference on Document Analysis and Recognition, Vol. 1, pp. 137–141, 2003.
[15] Anshul Gupta, Manisha Srivastava, “Offline Handwritten Character Recognition”, pp. 1-27, April 2011.
[16] Shifei Ding, Weikuan Jia, Chunyang Su, Fengxiang Jin, “A survey on Statistical Pattern Feature Extraction”, Advanced Intelligent
Computing Theories and Applications. With Aspects of Artificial Intelligence, 4th International Conference on Intelligent
Computing, ICIC 2008, Proceedings (pp.701-708), Shanghai, China, September 15-18, 2008.
[17] Abdul Salam Shah, M.N.A. Khan, Fazli Subhan, Muhammad Fayaz, Asadullah Shah, “An Offline Signature Verification
Technique Using Pixels Intensity Levels”, International Journal of Signal Processing, Image Processing and Pattern Recognition,
August 2016.
[18] Youness Tabii, Mohamed Lazaar, Mohammed Al Achhab, Nourddine Ennaya, Book872 ,Big Data ,Cloud and Applications :Third International
Conference, Communications in Computer and Information Science, Kenitra, Morocco, April 2018.
[19] Available at https://homepages.inf.ed.ac.uk/rbf/HIPR2/label.htm
[20] Faiq Baji, Mihai L. Mocanu, Popa Didi Liliana, “Brain tumor detection based on asymmetry and K-means clustering MRI
image segmentation”, Journal of Engineering Science and Technology, Vol. 13, No. 12, pp. 4145 – 4159, 2018.

More Related Content

What's hot

Texture based feature extraction and object tracking
Texture based feature extraction and object trackingTexture based feature extraction and object tracking
Texture based feature extraction and object trackingPriyanka Goswami
 
Automatic dominant region segmentation for natural images
Automatic dominant region segmentation for natural imagesAutomatic dominant region segmentation for natural images
Automatic dominant region segmentation for natural imagescsandit
 
Review of Image Segmentation Techniques based on Region Merging Approach
Review of Image Segmentation Techniques based on Region Merging ApproachReview of Image Segmentation Techniques based on Region Merging Approach
Review of Image Segmentation Techniques based on Region Merging ApproachEditor IJMTER
 
A Survey of Image Processing and Identification Techniques
A Survey of Image Processing and Identification TechniquesA Survey of Image Processing and Identification Techniques
A Survey of Image Processing and Identification Techniquesvivatechijri
 
A comparative study on content based image retrieval methods
A comparative study on content based image retrieval methodsA comparative study on content based image retrieval methods
A comparative study on content based image retrieval methodsIJLT EMAS
 
COLOUR BASED IMAGE SEGMENTATION USING HYBRID KMEANS WITH WATERSHED SEGMENTATION
COLOUR BASED IMAGE SEGMENTATION USING HYBRID KMEANS WITH WATERSHED SEGMENTATIONCOLOUR BASED IMAGE SEGMENTATION USING HYBRID KMEANS WITH WATERSHED SEGMENTATION
COLOUR BASED IMAGE SEGMENTATION USING HYBRID KMEANS WITH WATERSHED SEGMENTATIONIAEME Publication
 
Paper id 252014130
Paper id 252014130Paper id 252014130
Paper id 252014130IJRAT
 
Density Driven Image Coding for Tumor Detection in mri Image
Density Driven Image Coding for Tumor Detection in mri ImageDensity Driven Image Coding for Tumor Detection in mri Image
Density Driven Image Coding for Tumor Detection in mri ImageIOSRjournaljce
 
Signature recognition using clustering techniques dissertati
Signature recognition using clustering techniques dissertatiSignature recognition using clustering techniques dissertati
Signature recognition using clustering techniques dissertatiDr. Vinayak Bharadi
 
Digital image classification22oct
Digital image classification22octDigital image classification22oct
Digital image classification22octAleemuddin Abbasi
 
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTSSPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTScscpconf
 
A comparison of image segmentation techniques, otsu and watershed for x ray i...
A comparison of image segmentation techniques, otsu and watershed for x ray i...A comparison of image segmentation techniques, otsu and watershed for x ray i...
A comparison of image segmentation techniques, otsu and watershed for x ray i...eSAT Journals
 

What's hot (20)

Texture based feature extraction and object tracking
Texture based feature extraction and object trackingTexture based feature extraction and object tracking
Texture based feature extraction and object tracking
 
Texture Classification
Texture ClassificationTexture Classification
Texture Classification
 
I017417176
I017417176I017417176
I017417176
 
Fc4301935938
Fc4301935938Fc4301935938
Fc4301935938
 
Automatic dominant region segmentation for natural images
Automatic dominant region segmentation for natural imagesAutomatic dominant region segmentation for natural images
Automatic dominant region segmentation for natural images
 
Review of Image Segmentation Techniques based on Region Merging Approach
Review of Image Segmentation Techniques based on Region Merging ApproachReview of Image Segmentation Techniques based on Region Merging Approach
Review of Image Segmentation Techniques based on Region Merging Approach
 
A Survey of Image Processing and Identification Techniques
A Survey of Image Processing and Identification TechniquesA Survey of Image Processing and Identification Techniques
A Survey of Image Processing and Identification Techniques
 
Ijetr011917
Ijetr011917Ijetr011917
Ijetr011917
 
A comparative study on content based image retrieval methods
A comparative study on content based image retrieval methodsA comparative study on content based image retrieval methods
A comparative study on content based image retrieval methods
 
COLOUR BASED IMAGE SEGMENTATION USING HYBRID KMEANS WITH WATERSHED SEGMENTATION
COLOUR BASED IMAGE SEGMENTATION USING HYBRID KMEANS WITH WATERSHED SEGMENTATIONCOLOUR BASED IMAGE SEGMENTATION USING HYBRID KMEANS WITH WATERSHED SEGMENTATION
COLOUR BASED IMAGE SEGMENTATION USING HYBRID KMEANS WITH WATERSHED SEGMENTATION
 
Paper id 252014130
Paper id 252014130Paper id 252014130
Paper id 252014130
 
Density Driven Image Coding for Tumor Detection in mri Image
Density Driven Image Coding for Tumor Detection in mri ImageDensity Driven Image Coding for Tumor Detection in mri Image
Density Driven Image Coding for Tumor Detection in mri Image
 
Signature recognition using clustering techniques dissertati
Signature recognition using clustering techniques dissertatiSignature recognition using clustering techniques dissertati
Signature recognition using clustering techniques dissertati
 
PPT s09-machine vision-s2
PPT s09-machine vision-s2PPT s09-machine vision-s2
PPT s09-machine vision-s2
 
Image segmentation using wvlt trnsfrmtn and fuzzy logic. ppt
Image segmentation using wvlt trnsfrmtn and fuzzy logic. pptImage segmentation using wvlt trnsfrmtn and fuzzy logic. ppt
Image segmentation using wvlt trnsfrmtn and fuzzy logic. ppt
 
Digital image classification22oct
Digital image classification22octDigital image classification22oct
Digital image classification22oct
 
C1803011419
C1803011419C1803011419
C1803011419
 
Extended fuzzy c means clustering algorithm in segmentation of noisy images
Extended fuzzy c means clustering algorithm in segmentation of noisy imagesExtended fuzzy c means clustering algorithm in segmentation of noisy images
Extended fuzzy c means clustering algorithm in segmentation of noisy images
 
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTSSPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
 
A comparison of image segmentation techniques, otsu and watershed for x ray i...
A comparison of image segmentation techniques, otsu and watershed for x ray i...A comparison of image segmentation techniques, otsu and watershed for x ray i...
A comparison of image segmentation techniques, otsu and watershed for x ray i...
 

Similar to Feature Extraction and Feature Selection using Textual Analysis

Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfHandwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfSachin414679
 
Segmentation and recognition of handwritten digit numeral string using a mult...
Segmentation and recognition of handwritten digit numeral string using a mult...Segmentation and recognition of handwritten digit numeral string using a mult...
Segmentation and recognition of handwritten digit numeral string using a mult...ijfcstjournal
 
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...IRJET Journal
 
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...IRJET Journal
 
Inpainting scheme for text in video a survey
Inpainting scheme for text in video   a surveyInpainting scheme for text in video   a survey
Inpainting scheme for text in video a surveyeSAT Journals
 
IRJET- Face Recognition using Machine Learning
IRJET- Face Recognition using Machine LearningIRJET- Face Recognition using Machine Learning
IRJET- Face Recognition using Machine LearningIRJET Journal
 
Brain Tumor Classification using Support Vector Machine
Brain Tumor Classification using Support Vector MachineBrain Tumor Classification using Support Vector Machine
Brain Tumor Classification using Support Vector MachineIRJET Journal
 
IRJET - Computer-Assisted ALL, AML, CLL, CML Detection and Counting for D...
IRJET -  	  Computer-Assisted ALL, AML, CLL, CML Detection and Counting for D...IRJET -  	  Computer-Assisted ALL, AML, CLL, CML Detection and Counting for D...
IRJET - Computer-Assisted ALL, AML, CLL, CML Detection and Counting for D...IRJET Journal
 
Multimedia Big Data Management Processing And Analysis
Multimedia Big Data Management Processing And AnalysisMultimedia Big Data Management Processing And Analysis
Multimedia Big Data Management Processing And AnalysisLindsey Campbell
 
A_Survey_Paper_on_Image_Classification_and_Methods.pdf
A_Survey_Paper_on_Image_Classification_and_Methods.pdfA_Survey_Paper_on_Image_Classification_and_Methods.pdf
A_Survey_Paper_on_Image_Classification_and_Methods.pdfBijayNag1
 
IRJET- Image based Approach for Indian Fake Note Detection by Dark Channe...
IRJET-  	  Image based Approach for Indian Fake Note Detection by Dark Channe...IRJET-  	  Image based Approach for Indian Fake Note Detection by Dark Channe...
IRJET- Image based Approach for Indian Fake Note Detection by Dark Channe...IRJET Journal
 
Deep Learning in Text Recognition and Text Detection : A Review
Deep Learning in Text Recognition and Text Detection : A ReviewDeep Learning in Text Recognition and Text Detection : A Review
Deep Learning in Text Recognition and Text Detection : A ReviewIRJET Journal
 
IRJET - An Enhanced Approach for Extraction of Text from an Image using Fuzzy...
IRJET - An Enhanced Approach for Extraction of Text from an Image using Fuzzy...IRJET - An Enhanced Approach for Extraction of Text from an Image using Fuzzy...
IRJET - An Enhanced Approach for Extraction of Text from an Image using Fuzzy...IRJET Journal
 
Image Classification and Annotation Using Deep Learning
Image Classification and Annotation Using Deep LearningImage Classification and Annotation Using Deep Learning
Image Classification and Annotation Using Deep LearningIRJET Journal
 
Text Extraction and Recognition Using Median Filter
Text Extraction and Recognition Using Median FilterText Extraction and Recognition Using Median Filter
Text Extraction and Recognition Using Median FilterIRJET Journal
 
An fpga based efficient fruit recognition system using minimum
An fpga based efficient fruit recognition system using minimumAn fpga based efficient fruit recognition system using minimum
An fpga based efficient fruit recognition system using minimumAlexander Decker
 
Finding similarities between structured documents as a crucial stage for gene...
Finding similarities between structured documents as a crucial stage for gene...Finding similarities between structured documents as a crucial stage for gene...
Finding similarities between structured documents as a crucial stage for gene...Alexander Decker
 
IRJET - Content based Image Classification
IRJET -  	  Content based Image ClassificationIRJET -  	  Content based Image Classification
IRJET - Content based Image ClassificationIRJET Journal
 

Similar to Feature Extraction and Feature Selection using Textual Analysis (20)

Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfHandwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
 
Segmentation and recognition of handwritten digit numeral string using a mult...
Segmentation and recognition of handwritten digit numeral string using a mult...Segmentation and recognition of handwritten digit numeral string using a mult...
Segmentation and recognition of handwritten digit numeral string using a mult...
 
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...
 
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
 
Choudhary2015
Choudhary2015Choudhary2015
Choudhary2015
 
Assignment-1-NF.docx
Assignment-1-NF.docxAssignment-1-NF.docx
Assignment-1-NF.docx
 
Inpainting scheme for text in video a survey
Inpainting scheme for text in video   a surveyInpainting scheme for text in video   a survey
Inpainting scheme for text in video a survey
 
IRJET- Face Recognition using Machine Learning
IRJET- Face Recognition using Machine LearningIRJET- Face Recognition using Machine Learning
IRJET- Face Recognition using Machine Learning
 
Brain Tumor Classification using Support Vector Machine
Brain Tumor Classification using Support Vector MachineBrain Tumor Classification using Support Vector Machine
Brain Tumor Classification using Support Vector Machine
 
IRJET - Computer-Assisted ALL, AML, CLL, CML Detection and Counting for D...
IRJET -  	  Computer-Assisted ALL, AML, CLL, CML Detection and Counting for D...IRJET -  	  Computer-Assisted ALL, AML, CLL, CML Detection and Counting for D...
IRJET - Computer-Assisted ALL, AML, CLL, CML Detection and Counting for D...
 
Multimedia Big Data Management Processing And Analysis
Multimedia Big Data Management Processing And AnalysisMultimedia Big Data Management Processing And Analysis
Multimedia Big Data Management Processing And Analysis
 
A_Survey_Paper_on_Image_Classification_and_Methods.pdf
A_Survey_Paper_on_Image_Classification_and_Methods.pdfA_Survey_Paper_on_Image_Classification_and_Methods.pdf
A_Survey_Paper_on_Image_Classification_and_Methods.pdf
 
IRJET- Image based Approach for Indian Fake Note Detection by Dark Channe...
IRJET-  	  Image based Approach for Indian Fake Note Detection by Dark Channe...IRJET-  	  Image based Approach for Indian Fake Note Detection by Dark Channe...
IRJET- Image based Approach for Indian Fake Note Detection by Dark Channe...
 
Deep Learning in Text Recognition and Text Detection : A Review
Deep Learning in Text Recognition and Text Detection : A ReviewDeep Learning in Text Recognition and Text Detection : A Review
Deep Learning in Text Recognition and Text Detection : A Review
 
IRJET - An Enhanced Approach for Extraction of Text from an Image using Fuzzy...
IRJET - An Enhanced Approach for Extraction of Text from an Image using Fuzzy...IRJET - An Enhanced Approach for Extraction of Text from an Image using Fuzzy...
IRJET - An Enhanced Approach for Extraction of Text from an Image using Fuzzy...
 
Image Classification and Annotation Using Deep Learning
Image Classification and Annotation Using Deep LearningImage Classification and Annotation Using Deep Learning
Image Classification and Annotation Using Deep Learning
 
Text Extraction and Recognition Using Median Filter
Text Extraction and Recognition Using Median FilterText Extraction and Recognition Using Median Filter
Text Extraction and Recognition Using Median Filter
 
An fpga based efficient fruit recognition system using minimum
An fpga based efficient fruit recognition system using minimumAn fpga based efficient fruit recognition system using minimum
An fpga based efficient fruit recognition system using minimum
 
Finding similarities between structured documents as a crucial stage for gene...
Finding similarities between structured documents as a crucial stage for gene...Finding similarities between structured documents as a crucial stage for gene...
Finding similarities between structured documents as a crucial stage for gene...
 
IRJET - Content based Image Classification
IRJET -  	  Content based Image ClassificationIRJET -  	  Content based Image Classification
IRJET - Content based Image Classification
 

More from vivatechijri

Understanding the Impact and Challenges of Corona Crisis on Education Sector...
Understanding the Impact and Challenges of Corona Crisis on  Education Sector...Understanding the Impact and Challenges of Corona Crisis on  Education Sector...
Understanding the Impact and Challenges of Corona Crisis on Education Sector...vivatechijri
 
LEADERSHIP ONLY CAN LEAD THE ORGANIZATION TOWARDS IMPROVEMENT AND DEVELOPMENT
LEADERSHIP ONLY CAN LEAD THE ORGANIZATION  TOWARDS IMPROVEMENT AND DEVELOPMENT  LEADERSHIP ONLY CAN LEAD THE ORGANIZATION  TOWARDS IMPROVEMENT AND DEVELOPMENT
LEADERSHIP ONLY CAN LEAD THE ORGANIZATION TOWARDS IMPROVEMENT AND DEVELOPMENT vivatechijri
 
A study on solving Assignment Problem
A study on solving Assignment ProblemA study on solving Assignment Problem
A study on solving Assignment Problemvivatechijri
 
Structural and Morphological Studies of Nano Composite Polymer Gel Electroly...
Structural and Morphological Studies of Nano Composite  Polymer Gel Electroly...Structural and Morphological Studies of Nano Composite  Polymer Gel Electroly...
Structural and Morphological Studies of Nano Composite Polymer Gel Electroly...vivatechijri
 
Theoretical study of two dimensional Nano sheet for gas sensing application
Theoretical study of two dimensional Nano sheet for gas sensing  applicationTheoretical study of two dimensional Nano sheet for gas sensing  application
Theoretical study of two dimensional Nano sheet for gas sensing applicationvivatechijri
 
METHODS FOR DETECTION OF COMMON ADULTERANTS IN FOOD
METHODS FOR DETECTION OF COMMON  ADULTERANTS IN FOODMETHODS FOR DETECTION OF COMMON  ADULTERANTS IN FOOD
METHODS FOR DETECTION OF COMMON ADULTERANTS IN FOODvivatechijri
 
The Business Development Ethics
The Business Development EthicsThe Business Development Ethics
The Business Development Ethicsvivatechijri
 
An Alternative to Hard Drives in the Coming Future:DNA-BASED DATA STORAGE
An Alternative to Hard Drives in the Coming Future:DNA-BASED DATA STORAGEAn Alternative to Hard Drives in the Coming Future:DNA-BASED DATA STORAGE
An Alternative to Hard Drives in the Coming Future:DNA-BASED DATA STORAGEvivatechijri
 
Enhancing The Capability of Chatbots
Enhancing The Capability of ChatbotsEnhancing The Capability of Chatbots
Enhancing The Capability of Chatbotsvivatechijri
 
Smart Glasses Technology
Smart Glasses TechnologySmart Glasses Technology
Smart Glasses Technologyvivatechijri
 
Future Applications of Smart Iot Devices
Future Applications of Smart Iot DevicesFuture Applications of Smart Iot Devices
Future Applications of Smart Iot Devicesvivatechijri
 
Cross Platform Development Using Flutter
Cross Platform Development Using FlutterCross Platform Development Using Flutter
Cross Platform Development Using Fluttervivatechijri
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systemsvivatechijri
 
Light Fidelity(LiFi)- Wireless Optical Networking Technology
Light Fidelity(LiFi)- Wireless Optical Networking TechnologyLight Fidelity(LiFi)- Wireless Optical Networking Technology
Light Fidelity(LiFi)- Wireless Optical Networking Technologyvivatechijri
 
Social media platform and Our right to privacy
Social media platform and Our right to privacySocial media platform and Our right to privacy
Social media platform and Our right to privacyvivatechijri
 
THE USABILITY METRICS FOR USER EXPERIENCE
THE USABILITY METRICS FOR USER EXPERIENCETHE USABILITY METRICS FOR USER EXPERIENCE
THE USABILITY METRICS FOR USER EXPERIENCEvivatechijri
 
Google File System
Google File SystemGoogle File System
Google File Systemvivatechijri
 
A Study of Tokenization of Real Estate Using Blockchain Technology
A Study of Tokenization of Real Estate Using Blockchain TechnologyA Study of Tokenization of Real Estate Using Blockchain Technology
A Study of Tokenization of Real Estate Using Blockchain Technologyvivatechijri
 

More from vivatechijri (20)

Understanding the Impact and Challenges of Corona Crisis on Education Sector...
Understanding the Impact and Challenges of Corona Crisis on  Education Sector...Understanding the Impact and Challenges of Corona Crisis on  Education Sector...
Understanding the Impact and Challenges of Corona Crisis on Education Sector...
 
LEADERSHIP ONLY CAN LEAD THE ORGANIZATION TOWARDS IMPROVEMENT AND DEVELOPMENT
LEADERSHIP ONLY CAN LEAD THE ORGANIZATION  TOWARDS IMPROVEMENT AND DEVELOPMENT  LEADERSHIP ONLY CAN LEAD THE ORGANIZATION  TOWARDS IMPROVEMENT AND DEVELOPMENT
LEADERSHIP ONLY CAN LEAD THE ORGANIZATION TOWARDS IMPROVEMENT AND DEVELOPMENT
 
A study on solving Assignment Problem
A study on solving Assignment ProblemA study on solving Assignment Problem
A study on solving Assignment Problem
 
Structural and Morphological Studies of Nano Composite Polymer Gel Electroly...
Structural and Morphological Studies of Nano Composite  Polymer Gel Electroly...Structural and Morphological Studies of Nano Composite  Polymer Gel Electroly...
Structural and Morphological Studies of Nano Composite Polymer Gel Electroly...
 
Theoretical study of two dimensional Nano sheet for gas sensing application
Theoretical study of two dimensional Nano sheet for gas sensing  applicationTheoretical study of two dimensional Nano sheet for gas sensing  application
Theoretical study of two dimensional Nano sheet for gas sensing application
 
METHODS FOR DETECTION OF COMMON ADULTERANTS IN FOOD
METHODS FOR DETECTION OF COMMON  ADULTERANTS IN FOODMETHODS FOR DETECTION OF COMMON  ADULTERANTS IN FOOD
METHODS FOR DETECTION OF COMMON ADULTERANTS IN FOOD
 
The Business Development Ethics
The Business Development EthicsThe Business Development Ethics
The Business Development Ethics
 
Digital Wellbeing
Digital WellbeingDigital Wellbeing
Digital Wellbeing
 
An Alternative to Hard Drives in the Coming Future:DNA-BASED DATA STORAGE
An Alternative to Hard Drives in the Coming Future:DNA-BASED DATA STORAGEAn Alternative to Hard Drives in the Coming Future:DNA-BASED DATA STORAGE
An Alternative to Hard Drives in the Coming Future:DNA-BASED DATA STORAGE
 
Enhancing The Capability of Chatbots
Enhancing The Capability of ChatbotsEnhancing The Capability of Chatbots
Enhancing The Capability of Chatbots
 
Smart Glasses Technology
Smart Glasses TechnologySmart Glasses Technology
Smart Glasses Technology
 
Future Applications of Smart Iot Devices
Future Applications of Smart Iot DevicesFuture Applications of Smart Iot Devices
Future Applications of Smart Iot Devices
 
Cross Platform Development Using Flutter
Cross Platform Development Using FlutterCross Platform Development Using Flutter
Cross Platform Development Using Flutter
 
3D INTERNET
3D INTERNET3D INTERNET
3D INTERNET
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Light Fidelity(LiFi)- Wireless Optical Networking Technology
Light Fidelity(LiFi)- Wireless Optical Networking TechnologyLight Fidelity(LiFi)- Wireless Optical Networking Technology
Light Fidelity(LiFi)- Wireless Optical Networking Technology
 
Social media platform and Our right to privacy
Social media platform and Our right to privacySocial media platform and Our right to privacy
Social media platform and Our right to privacy
 
THE USABILITY METRICS FOR USER EXPERIENCE
THE USABILITY METRICS FOR USER EXPERIENCETHE USABILITY METRICS FOR USER EXPERIENCE
THE USABILITY METRICS FOR USER EXPERIENCE
 
Google File System
Google File SystemGoogle File System
Google File System
 
A Study of Tokenization of Real Estate Using Blockchain Technology
A Study of Tokenization of Real Estate Using Blockchain TechnologyA Study of Tokenization of Real Estate Using Blockchain Technology
A Study of Tokenization of Real Estate Using Blockchain Technology
 

Recently uploaded

Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIkoyaldeepu123
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 

Recently uploaded (20)

young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AI
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 

Feature Extraction and Feature Selection using Textual Analysis

  • 1. VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 3 (2020) ISSN(Online): 2581-7280 Article No. 9 PP 1-6 1 www.viva-technology.org/New/IJRI Feature Extraction and Feature Selection using Textual Analysis Hemlata Badwal1 , Prof. Chandani Patel2 1 (Department of Computer Applications, VIVA School of MCA, Virar, Maharashtra, India) Email: badwalhemlata@gmail.com 2 (Department of Computer Applications, VIVA School of MCA, Virar, Maharashtra, India) Email: chandaniapatel@gmail.com Abstract: After pre-processing the images in character recognition systems, the images are segmented based on certain characteristics known as “features”. The feature space identified for character recognition is however ranging across a huge dimensionality. To solve this problem of dimensionality, the feature selection and feature extraction methods are used. Hereby in this paper, we are going to discuss, the different techniques for feature extraction and feature selection and how these techniques are used to reduce the dimensionality of feature space to improve the performance of text categorization. Keywords – Character Recognition, Feature Extraction, Feature Selection, Image Segmentation, Pre- processing. 1. INTRODUCTION The conversion of textual documents in digital form have increased rapidly worldwide. This is where the text classification becomes a dire need to handle these documents. The character recognition systems are thus used to fulfill these needs. The main objective of character recognition systems is to recognize and classify the text on the basis of predefined categories using some classifiers. Several analysis works have been done to evolve newer techniques and strategies that would scale back the time interval for processing whereas providing higher recognition accuracy. In the following section, the four major steps used for handwriting recognition systems are described: - Pre-processing, Image segmentation, Feature extraction and Classification. Section II describes the various methods available for pre-processing, feature selection and feature extraction and the classifiers [1]. Section III presents the conclusive idea regarding the reviewed methods. 2. WORKING PRINCIPLE Handwritten character recognition is the ability of an automated system to recognize handwritten character or sentence from photographs, documents, touchscreens and other such devices by a computer [1]. The image of the written sentence / word / character may be gathered either offline or on-line. In off line technique it may be from a scanned image of a paper. In on-line the sleuthing motion of the pen tip, for example by a pen- based display screen surface capturing temporal frequency [2].
  • 2. VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 3 (2020) ISSN(Online): 2581-7280 Article No. 9 PP 1-6 2 www.viva-technology.org/New/IJRI Fig. 1 Block Diagram of Character Recognition Fig. 1 describes the process of text classification. Although, text classification has one limitation. This limitation includes high spatial property of feature area thanks to terribly sizable amount of options. The larger the options, the bigger the complexity of methods used for text classification and also the smaller the accuracy due to irrelevant or redundant terms within the feature space. Feature extraction and feature selection methods are used to eradicate this downside. The initial major step in textual classification is Pre-processing. The moot data existing in the document is eliminated using the pre-processing. The subsequent step, image segmentation decomposes an image of a sequence of characters into sub images of individual symbols. Then, each character image is resized into m x n pixels towards the training network. These sub images are used to select the subset from the original feature set on the basis of importance of features, known as Feature Extraction. The fundamental task of feature extraction and selection is to identify a group of the most effective features for classification; so as to compress the high- dimensional feature space to low-dimensional feature space, for efficient classification [4] [16]. 2.1 PRE-PROCESSING When a document is scanned it requires some preliminary processing. This pre-processing helps to eliminate the irrelevant data in the scanned image in the form of noise and resize the image. For pre-processing the document, noise reduction, binarization, skew correction, etc. techniques can be used. The main objectives of pre-processing are [1]: 1. Noise reduction 2. Binarization 3. Skew correction 4. Stroke width normalization Noise Reduction The noise removal is achieved by converting RGB image into grayscale. It is then converted into binary image consisting only pixels 0’s (white) and 1’s (black). Unnecessary background pixels are removed from original image. To remove the redundant or irrelevant data, the threshold value is applied to the image. Fig. 2 Grayscale image with noise Fig. 3 Reduced noise image
  • 3. VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 3 (2020) ISSN(Online): 2581-7280 Article No. 9 PP 1-6 3 www.viva-technology.org/New/IJRI Stroke-width normalization After normalization generally it reduces the amount of data to be processed. For e.g. by thinning the shape information of a character can be gathered without losing the data. Fig. 4 Stroke-width normalization image Skew Correction and Slant Removal Skew correction methods are used for the alignment of the coordinate system of the scanner with respect to that of the document. Its main approaches include correlation, projection profiles and Hough transform etc [1]. The slant of any handwritten text(s) varies from one user to another. The characters can be normalized by using the slant removal methods [1]. Fig. 5 Skew correction and Slant removal image Binarization In Binarization, a gray scale image if transformed to a binary image using global thresholding technique [3]. Let’s assume f(x,y) is an input image. T is the threshold value and g(x,y) is the output image of thresholding process then the mathematical equation of this conversion is g(x,y) =1 if f(x,y) ≥ T otherwise 0. Fig. 6 Binarized image 2.2 IMAGE SEGMENTATION Image Segmentation involves two major steps [4]: 1. Line Detection 2. Word Detection Line detection Lines are differentiated using the Hough Transform, horizontal projections, etc. technique. Hough transform detects straight lines. The straight line y = mx + b can be represented as a point (b, m) in the parameter space, where b represents the intercept parameter and m represents the slope parameter. Horizontal Projection profile is the projection profile of an image along horizontal axis. It calculates each row for all column pixels in that row. Fig. 7 Line Segmentation
  • 4. VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 3 (2020) ISSN(Online): 2581-7280 Article No. 9 PP 1-6 4 www.viva-technology.org/New/IJRI Word detection Words can be identified using vertical projections, connected component analysis, etc. Vertical Projection profile is the projection profile of an image along vertical axis. Vertical projection profile is calculated for each column as summation of all row pixel values inside the column. Connected component labeling works by scanning an image, pixel-by-pixel (from top to bottom and left to right) so as to identify connected pixel regions, [19] i.e. regions of adjacent pixels that share the same set of intensity values V. For a binary image V={1} [20]. Fig. 8 Word Segmentation 2.3 FEATURE EXTRACTION Feature extraction methods include principal component analysis (PCA), latent semantic indexing (LSI), etc [3]. Principal Component Analysis (PCA): Principal Component Analysis is dimension reduction technique that extracts the information from various dataset. PCA produces a set of lower-dimensional features from the original dataset. PCA is sensitive to the relative scaling of the original variables [5]. PCA determines the number of principal components using certain criterions. The dimensions representing the principal components at the best are chosen. The number of principal components to be chosen varies depending upon the quality of the dataset. Latent Semantic Indexing (LSI): Latent Semantic Indexing (LSI) is a technique that projects queries and documents into a space with “latent” semantic dimensions. LSI has fewer dimensions than the original space. It works as a similarity matrix that is an alternative to word overlap measures like td.idf. LSI creates a matrix of words and sentences in a document that determine the occurrences of similar words which can be used for classification. 2.4 CLASSIFICATION Classification is a supervised machine learning technique which identifies the category that a new observation belongs to, based on the observations used in the training set. The classifier utilization depends on several factors, such as available training set, range of free parameters. Some of the vital techniques which can be used for classification are- decision Tree (DT), k-Nearest Neighbor (k-NN), Bayes Classifier, Neural Networks (NN), Support Vector Machines (SVM), etc [1]. Decision Tree (DT) Classifier: In Decision tree classifier, the features are split into completely different regions corresponding to the classes available. It uses Information Gain value to identify the node at each level of the DT. The feature with the highest Information Gain is chosen as the node. The class is assigned along the path of the assigned nodes containing the features. k-Nearest Neighbor (k-NN) Classifier: The purpose of k-NN classifier is used to assign the class label to a cluster of similar elements. It calculates the nearest neighbors using the Euclidean distance matrix. The similarity score is used as the weight of the classes of the neighbor document. Naive Bayes (MB) Classifier: Naive Bayes Classifier is a probabilistic classifier. It assumes the independence of features. The value of a particular feature is independent of any other feature’s value, given the class variable [17]. For example, a fruit can be considered to be an apple if it is round, red, and about 10 cm in diameter. A naive Bayes classifier considers each and every feature contribute independently to the probability that this fruit is an apple, regardless of any attainable correlations between the color, roundness, and diameter features. Naive Bayes Classifier has surprisingly performed quite well in many complex real-world issues albeit it considers naive assumptions. Training and testing phase in Naive Bayes is easy for implementation and computation.
  • 5. VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 3 (2020) ISSN(Online): 2581-7280 Article No. 9 PP 1-6 5 www.viva-technology.org/New/IJRI Support Vector Machine (SVM) Classifier: Support Vector Machine belongs to supervised learning classification technique. SVM algorithm consists of 2 types of versions: non-linear and linear versions [18]. In non-linear version, classes are not separated i.e. no straight lines that separate the classes can be found. In linear version, the classes are separated with the hyperplanes. Linear classifier is defined as wT x + b = 0. Where, w is the direction of the hyperplane and b is the precise position of hyperplane [18]. Fig. 9 Linear SVM Classifier SVM find this hyperplane by using margins and support vectors. The region between the two hyperplanes wT x + b = 1 and wTx + b = -1 is known as margin. 2/||w|| is the width of the margin. The tuples that fall on the two hyperplanes are known as support vectors. 3. CONCLUSION With the rapid increase in the digital conversion of texts, the need for handwriting recognition systems are soaring higher and higher. Here in this paper, we have discussed the techniques used in converting the text into the digital form. The accuracy rate of the systems determines how well a system is. But the text classification suffers from the higher dimensionality problem. To reduce the feature space and increase the accuracy of the systems, we have reviewed the pre-processing techniques, image segmentation, feature extraction as well as the classifiers available for text recognition. REFERENCES [1] Vijay Prasad and Yumnam Jayanta Singh, "A study on method of feature extraction for Handwritten Character Recognition", Indian Journal of Science and Technology, pp. 174-178, March 2013. [2] Ayush Purohit and Shardul Singh Chauhan, "A Literature Survey on Handwritten Character Recognition", (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 7 (1), 2016. [3] J. Pradeep, E. Srinivasan and S. Himavathi, "Diagonal based feature extraction for Handwritten alphabets recognition system using neural network", International Journal of Computer Science & Information Technology, Vol 3, No 1, Feb 2011. [4] Foram P. Shah and Vibha Patel, "A Review on Feature Selection and Feature Extraction for Text Classification", International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), March 2016. [5] Gaurav Kumar and Pradeep Kumar Bhatia, "A Detailed Review of Feature Extraction in Image Processing Systems", Fourth International Conference on Advanced Computing & Communication Technologies, Feb 2014. [6] T. Fujisaki, T.E. Chefalas, J. Kim, C.C. Tappert and C.G. Wolf, “On-Line Run-On Character Recognizer: Design and Performance”, Character and Handwriting Recognition: Expanding Frontiers. P.S.P. Wang, ed., pp. 123-137, Singapore: World Scientific, 1991.
  • 6. VIVA-Tech International Journal for Research and Innovation Volume 1, Issue 3 (2020) ISSN(Online): 2581-7280 Article No. 9 PP 1-6 6 www.viva-technology.org/New/IJRI [7] Oivind Due Trier, Torfinn Taxt, Anil K. Jain, “Feature Extraction Methods for Character Recognition-A survey”, Pattern Recognition, Vol. 29, pp. 641-662, April 1996. [8] K. Gaurav and Bhatia P. K., “Analytical Review of Preprocessing Techniques for Offline Handwritten Character Recognition”, 2nd International Conference on Emerging Trends in Engineering & Management, ICETEM, 2013. [9] A. Brakensiek, J. Rottland, G. Rigoll, A. Kosmala, “Offline Handwriting Recognition using various Hybrid Modeling Techniques and Character N-Grams”, Available at http://irs.ub.rug.nl/dbi/4357a84695495. [10] R.G. Casey and E. Lecolinet, “A Survey of Methods and Strategies in Character Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 18, No.7, pp. 690-706, July 1996. [11] S. N. Srihari, R. Plamondon, “On-line and off- line handwritten character recognition: A comprehensive survey,” IEEE. Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 1, pp. 63-84, 2000. [12] Harun Uguz, “A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm”, Elsevier Knowledge-Based Systems, pp. 1024-1032, 2012. [13] S. Niharika, V. Sneha Latha and D.R. Lavanya, “A Survey on Text Categorization”, International Journal of Computer Trends and Technology, Vol. 3, pp. 39-45, 2012. [14] M. Blumenstein, H. Basli, B. Verma, “A Novel Feature Extraction Technique for the Recognition of Segmented Handwritten Characters”, Proceedings of the 7th International Conference on Document Analysis and Recognition, Vol. 1, pp. 137–141, 2003. [15] Anshul Gupta, Manisha Srivastava, “Offline Handwritten Character Recognition”, pp. 1-27, April 2011. [16] Shifei Ding, Weikuan Jia, Chunyang Su, Fengxiang Jin, “A survey on Statistical Pattern Feature Extraction”, Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence, 4th International Conference on Intelligent Computing, ICIC 2008, Proceedings (pp.701-708), Shanghai, China, September 15-18, 2008. [17] Abdul Salam Shah, M.N.A. Khan, Fazli Subhan, Muhammad Fayaz, Asadullah Shah, “An Offline Signature Verification Technique Using Pixels Intensity Levels”, International Journal of Signal Processing, Image Processing and Pattern Recognition, August 2016. [18] Youness Tabii, Mohamed Lazaar, Mohammed Al Achhab, Nourddine Ennaya, Book872 ,Big Data ,Cloud and Applications :Third International Conference, Communications in Computer and Information Science, Kenitra, Morocco, April 2018. [19] Available at https://homepages.inf.ed.ac.uk/rbf/HIPR2/label.htm [20] Faiq Baji, Mihai L. Mocanu, Popa Didi Liliana, “Brain tumor detection based on asymmetry and K-means clustering MRI image segmentation”, Journal of Engineering Science and Technology, Vol. 13, No. 12, pp. 4145 – 4159, 2018.