Devnagari document segmentation using histogram approachVikas Dongre
Document segmentation is one of the critical phases in machine recognition of any language. Correct
segmentation of individual symbols decides the accuracy of character recognition technique. It is used to
decompose image of a sequence of characters into sub images of individual symbols by segmenting lines and
words. Devnagari is the most popular script in India. It is used for writing Hindi, Marathi, Sanskrit and
Nepali languages. Moreover, Hindi is the third most popular language in the world. Devnagari documents
consist of vowels, consonants and various modifiers. Hence proper segmentation of Devnagari word is
challenging. A simple histogram based approach to segment Devnagari documents is proposed in this paper.
Various challenges in segmentation of Devnagari script are also discussed.
DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACHijcseit
This document summarizes a research paper on Devnagari document segmentation using a histogram approach. It discusses challenges in segmenting the Devnagari script used for several Indian languages. A simple algorithm is proposed using horizontal and vertical histograms to segment documents into lines, words and characters. The algorithm achieves near 100% accuracy for line segmentation but lower accuracy for word and character segmentation due to complexities in the Devnagari script. Future work is needed to improve character segmentation handling connected and modified characters.
Isolated Arabic Handwritten Character Recognition Using Linear CorrelationEditor IJCATR
Handwriting recognition systems have emerged and evolved significantly, especially in English language, but for the Arabic
language, such systems did not find that sufficient attention in comparison to other languages .Therefore, the aim of this paper to
highlight the Optical Character Recognition using linear correlation algorithm in two dimensions and then the programs can to identify
discrete Arabic letters application started manually, the program has been successfully applied.
Recognition of Persian handwritten characters has been considered as a significant field of research for
the last few years under pattern analysing technique. In this paper, a new approach for robust handwritten
Persian numerals recognition using strong feature set and a classifier fusion method is scrutinized to
increase the recognition percentage. For implementing the classifier fusion technique, we have considered
k nearest neighbour (KNN), linear classifier (LC) and support vector machine (SVM) classifiers. The
innovation of this tactic is to attain better precision with few features using classifier fusion method. For
evaluation of the proposed method we considered a Persian numerals database with 20,000 handwritten
samples. Spending 15,000 samples for training stage, we verified our technique on other 5,000 samples,
and the correct recognition ratio achievedapproximately 99.90%. Additional, we got 99.97% exactness
using four-fold cross validation procedure on 20,000 databases.
SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXTcscpconf
This document discusses the segmentation of printed Bangla characters without modifiers for optical character recognition systems. It begins with an introduction to OCR systems and Bangla script. The main steps of an OCR system are then outlined, with a focus on the segmentation step. Line, word and character segmentation algorithms are described in detail along with figures to illustrate the steps. The goal is to properly segment individual characters for recognition.
Free-scale Magnification for Single-Pixel-Width Alphabetic Typeface Charactersinventionjournals
This document presents a novel approach for magnifying single-pixel-width alphabetic typeface characters. It first removes useless serif patterns from the character. It then applies an intuitive stroke transcribing algorithm to describe the character. Each stroke, represented by cubic B-spline functions, is scaled using wavelet transform with arbitrary size. Experimental results on computer fonts show the proposed algorithm performs magnification well while maintaining character shape and structure without additional distortion.
International Journal of Research in Engineering and Science is an open access peer-reviewed international forum for scientists involved in research to publish quality and refereed papers. Papers reporting original research or experimentally proved review work are welcome. Papers for publication are selected through peer review to ensure originality, relevance, and readability.
Scene text recognition in mobile applications by character descriptor and str...eSAT Journals
This document presents a method for scene text recognition in mobile applications using character descriptors and structure configuration. It proposes using a character descriptor that combines feature detectors and descriptors to extract text features effectively. It also models character structure using stroke configuration maps derived from character boundaries and skeletons. The method was tested on various datasets and achieved accuracy rates above 70%, outperforming existing methods. It can detect text regions and recognize text information for applications like text understanding and retrieval on mobile devices.
Devnagari document segmentation using histogram approachVikas Dongre
Document segmentation is one of the critical phases in machine recognition of any language. Correct
segmentation of individual symbols decides the accuracy of character recognition technique. It is used to
decompose image of a sequence of characters into sub images of individual symbols by segmenting lines and
words. Devnagari is the most popular script in India. It is used for writing Hindi, Marathi, Sanskrit and
Nepali languages. Moreover, Hindi is the third most popular language in the world. Devnagari documents
consist of vowels, consonants and various modifiers. Hence proper segmentation of Devnagari word is
challenging. A simple histogram based approach to segment Devnagari documents is proposed in this paper.
Various challenges in segmentation of Devnagari script are also discussed.
DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACHijcseit
This document summarizes a research paper on Devnagari document segmentation using a histogram approach. It discusses challenges in segmenting the Devnagari script used for several Indian languages. A simple algorithm is proposed using horizontal and vertical histograms to segment documents into lines, words and characters. The algorithm achieves near 100% accuracy for line segmentation but lower accuracy for word and character segmentation due to complexities in the Devnagari script. Future work is needed to improve character segmentation handling connected and modified characters.
Isolated Arabic Handwritten Character Recognition Using Linear CorrelationEditor IJCATR
Handwriting recognition systems have emerged and evolved significantly, especially in English language, but for the Arabic
language, such systems did not find that sufficient attention in comparison to other languages .Therefore, the aim of this paper to
highlight the Optical Character Recognition using linear correlation algorithm in two dimensions and then the programs can to identify
discrete Arabic letters application started manually, the program has been successfully applied.
Recognition of Persian handwritten characters has been considered as a significant field of research for
the last few years under pattern analysing technique. In this paper, a new approach for robust handwritten
Persian numerals recognition using strong feature set and a classifier fusion method is scrutinized to
increase the recognition percentage. For implementing the classifier fusion technique, we have considered
k nearest neighbour (KNN), linear classifier (LC) and support vector machine (SVM) classifiers. The
innovation of this tactic is to attain better precision with few features using classifier fusion method. For
evaluation of the proposed method we considered a Persian numerals database with 20,000 handwritten
samples. Spending 15,000 samples for training stage, we verified our technique on other 5,000 samples,
and the correct recognition ratio achievedapproximately 99.90%. Additional, we got 99.97% exactness
using four-fold cross validation procedure on 20,000 databases.
SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXTcscpconf
This document discusses the segmentation of printed Bangla characters without modifiers for optical character recognition systems. It begins with an introduction to OCR systems and Bangla script. The main steps of an OCR system are then outlined, with a focus on the segmentation step. Line, word and character segmentation algorithms are described in detail along with figures to illustrate the steps. The goal is to properly segment individual characters for recognition.
Free-scale Magnification for Single-Pixel-Width Alphabetic Typeface Charactersinventionjournals
This document presents a novel approach for magnifying single-pixel-width alphabetic typeface characters. It first removes useless serif patterns from the character. It then applies an intuitive stroke transcribing algorithm to describe the character. Each stroke, represented by cubic B-spline functions, is scaled using wavelet transform with arbitrary size. Experimental results on computer fonts show the proposed algorithm performs magnification well while maintaining character shape and structure without additional distortion.
International Journal of Research in Engineering and Science is an open access peer-reviewed international forum for scientists involved in research to publish quality and refereed papers. Papers reporting original research or experimentally proved review work are welcome. Papers for publication are selected through peer review to ensure originality, relevance, and readability.
Scene text recognition in mobile applications by character descriptor and str...eSAT Journals
This document presents a method for scene text recognition in mobile applications using character descriptors and structure configuration. It proposes using a character descriptor that combines feature detectors and descriptors to extract text features effectively. It also models character structure using stroke configuration maps derived from character boundaries and skeletons. The method was tested on various datasets and achieved accuracy rates above 70%, outperforming existing methods. It can detect text regions and recognize text information for applications like text understanding and retrieval on mobile devices.
Study on a Hybrid Segmentation Approach for Handwritten Numeral Strings in Fo...inventionjournals
This paper presents a hybrid approach to segment single- or multiple-touching handwritten numeral strings in form document, the core of which is the combined use of foreground, background and recognition analysis. The algorithm first located some feature points on both the foreground and background skeleton images containing connected numeral strings in form document. Possible segmentation paths were then constructed by matching these feature points, with an unexpected benefit of removing useless strokes. Subsequently, all these segmentation paths were validated and ranked by a recognition-based analysis, where a well-trained two-stage classifier was applied to each separated digit image to obtain its reliability. Finally, by introducing a locally optimal strategy to accelerate the recognition process, the top ranked segmentation path survived to help make a decision on whether to accept or not. Experimental results show that the proposed method can achieve a correct segmentation rate of 96.2 percent on a large dataset collected by our own.
1. The document presents a methodology for recognizing isolated handwritten Devanagari numerals using structural and statistical features.
2. Key features extracted include whether the numeral has openings on the left, right, above or below, and the number of horizontal and vertical crossings.
3. The methodology achieves an average accuracy of 96.8% on a dataset of 500 numeral images collected from various individuals. Accuracy is highest for numerals 0, 6, 8 and 10 at 100%, while some similar numerals like 3 and 2 see more errors.
This document summarizes a research paper that proposes a new method for segmenting text lines in machine printed Telugu script documents. It begins by noting common challenges with existing horizontal projection profile methods for line segmentation when text lines overlap. The proposed method first estimates the top and bottom bounding lines of the inter-line space using statistical analysis of the horizontal profile. It then predicts the segmentation path, which is set at 70% of the inter-line space to follow character contours. When two possible paths exist, it analyzes the horizontal profile of a small sub-image to select the correct path. The method achieved a success rate of 99.1% on tested documents.
In This paper we presented new approach for cursive Arabic text recognition system. The objective is to propose methodology analytical offline recognition of handwritten Arabic for rapid implementation.The first part in the writing recognition system is the preprocessing phase is the preprocessing phase to prepare the data was introduces and extracts a set of simple statistical features by two methods : from a window which is sliding long that text line the right to left and the approach VH2D (consists in projecting every character on the abscissa, on the ordinate and the diagonals 45° and 135°) . It then injects the resulting feature vectors to Hidden Markov Model (HMM) and combined the two HMM by multi-stream approach.
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONijnlc
This document presents a multi-stream HMM approach for offline handwritten Arabic word recognition. It extracts two sets of features from each word using a sliding window approach and VH2D projection approach. These features are input to separate HMM classifiers, and the outputs are combined in a multi-stream HMM to provide more reliable recognition. The system is evaluated on 200 words, achieving a recognition rate of 83.8% using the multi-stream approach compared to 78.2% and 76.6% for the individual classifiers.
The technical study had been performed on
many foreign languages like Japanese; Chinese etc. but the
efforts on Indian ancient script is still immature. As the Modi
script language is ancient and cursive type, the OCR of it is still
not widely available. As per our knowledge, Prof. D.N.Besekar,
Dept. of Computer Science, Shri. Shivaji College of Science,
Akola had proposed a system for recognition of offline
handwritten MODI script Vowels. The challenges of
recognition of handwritten Modi characters are very high due
to the varying writing style of each individual. Many vital
documents with precious information have been written in
Modi and currently, these documents have been stored and
preserved in temples and museums. Over a period of time these
documents will wither away if not given due attention. In this
paper we propose a system for recognition of handwritten
Modi script characters; the proposed method uses Image
processing techniques and algorithms which are described
below.
General Terms
Preprocessing techniques: Gray scaling, Thresholding,
Boundary detection, Thinning, cropping, scaling, Template
generation. Other algorithms used- Average method, otsu
method, Stentiford method, Template-based matching method
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Offline Signiture and Numeral Recognition in Context of ChequeIJERA Editor
Signature is considered as one of the biometrics. Signature Verification System is required in almost all places where it is compulsory to authenticate a person or his/her credentials to proceed further transaction especially when it comes to bank cheques. For this purpose signature verification system must be powerful and accurate. Till date various methods have been used to make signature verification system powerful and accurate. Research here is related to offline signature verification. Shape Contexts have been used to verify whether 2 shapes are similar or not. It has been used for various applications such as digit recognition, 3D Object recognition, trademark retrieval etc. In this paper we present a modified version of shape context for signature verification on bank cheques using K-Nearest Neighbor classifier.
Font type identification of hindi printed documenteSAT Journals
This document summarizes an approach for identifying the font type in Hindi printed documents. It discusses preprocessing steps like noise removal, binarization, and skew correction. Features are extracted from text including typographical features like word height and width, and structural features like thickness of vertical lines. Five Hindi font types are classified using these extracted features, achieving 96-100% accuracy on test documents. Future work could involve word-level font identification and improving image quality before feature extraction and classification.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Nowadays character recognition has gained lot of attention in the field of pattern recognition due to its application in various fields. It is one of the most successful applications of automatic pattern recognition. Research in OCR is popular for its application potential in banks, post offices, office automation etc. HCR is useful in cheque processing in banks; almost all kind of form processing systems, handwritten postal address resolution and many more. This paper presents a simple and efficient approach for the implementation of OCR and translation of scanned images of printed text into machine-encoded text. It makes use of different image analysis phases followed by image detection via pre-processing and post-processing. This paper also describes scanning the entire document (same as the segmentation in our case) and recognizing individual characters from image irrespective of their position, size and various font styles and it deals with recognition of the symbols from English language, which is internationally accepted.
1) The document presents a Tamil character recognition system using Hilditch's algorithm and structural characteristics of characters.
2) It describes the image preprocessing steps of binarization, segmentation, bounding box detection, normalization and skeletonization.
3) Hilditch's algorithm is applied to obtain thin, connected skeletons of characters.
4) Moments-based features are extracted from skeletons for character recognition using a neural network classifier.
5) The system achieved a recognition accuracy of 99% for identified Tamil characters.
Image to Text Converter PPT. PPT contains step by step algorithms/methods to which we can convert images in to text , specially contains algorithms for images which contains human handwritting, can convert writting in to text, img to text.
The document discusses developing an optical character recognition (OCR) system for various Indian languages like Devnagri, Bengali, and Punjabi. It describes segmenting text images into lines, words, and individual characters. Machine learning models are trained to recognize characters. The project implements libraries for segmenting text into characters using machine vision and recognizing characters using machine learning. Rule-based approaches are also used for specific languages. Steps include preprocessing images, segmenting lines, words, removing decorative lines, dividing words into zones, and downsampling characters for recognition. Various classification algorithms like KNN, logistic regression, SVM, and neural networks are evaluated for character recognition.
IRJET- Classification of Hindi Maatras by Encoding SchemeIRJET Journal
This document presents a novel encoding scheme for classifying Hindi modifiers or matras. It describes existing techniques for modifier segmentation that use projection profiles and character heights. The proposed method segments modifiers using three encoding levels to assign a distinctive code for each modifier. For ascenders, the codes are based on the middle portion, skeletonized shape, and ends. For descenders, the codes consider width, last pixel, and lower right space. The method was tested on printed and handwritten modifiers, achieving over 90% accuracy for both ascenders and descenders. The encoding approach allows for direct classification of modifiers without feature extraction.
GEOMETRIC CORRECTION FOR BRAILLE DOCUMENT IMAGEScsandit
Braille system has been used by the visually impaired people for reading.The shortage of Braille
books has caused a need for conversion of Braille to text. This paper addresses the geometric
correction of a Braille document images. Due to the standard measurement of the Braille cells,
identification of Braille characters could be achieved by simple cell overlapping procedure. The
standard measurement varies in a scaled document and fitting of the cells become difficult if the
document is tilted. This paper proposes a line fitting algorithm for identifying the tilt (skew)
angle. The horizontal and vertical scale factor is identified based on the ratio of distance
between characters to the distance between dots. These are used in geometric transformation
matrix for correction. Rotation correction is done prior to scale correction. This process aids in
increased accuracy. The results for various Braille documents are tabulated.
This document discusses statistical feature extraction methods for isolated handwritten Gurumukhi script characters. It introduces Zernike and Pseudo-Zernike moment-based methods for extracting features from preprocessed and normalized Gurumukhi character images. Features are extracted at various moment orders and used to reconstruct the images to check accuracy. The document provides background on Gurumukhi script and discusses shape descriptors and image moments as methods for statistical feature extraction. Experimental results using Zernike and Pseudo-Zernike moments are presented.
Text detection and recognition in scene images or natural images has applications in computer
vision systems like registration number plate detection, automatic traffic sign detection, image retrieval
and help for visually impaired people. Scene text, however, has complicated background, blur image,
partly occluded text, variations in font-styles, image noise and ranging illumination. Hence scene text
recognition could be a difficult computer vision problem. In this paper connected component method
is used to extract the text from background. In this work, horizontal and vertical projection profiles,
geometric properties of text, image binirization and gap filling method are used to extract the text from
scene images. Then histogram based threshold is applied to separate text background of the images.
Finally text is extracted from images.
An exhaustive font and size invariant classification scheme for ocr of devana...ijnlc
The document presents a classification scheme for recognizing Devanagari characters that is invariant to font and size. It identifies the basic symbols that commonly appear in the middle zone of Devanagari text across different fonts and sizes. Through an analysis of over 465,000 words from various sources, it finds that 345 symbols account for 99.97% of text and aims to classify these into groups based on structural properties like the presence or absence of vertical bars. The proposed classification scheme is validated on 25 fonts and 3 sizes to demonstrate its font and size invariance.
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...acijjournal
Handwritten character recognition is conversion of handwritten text to machine readable and editable form. Online character recognition deals with live conversion of characters. Malayalam is a language spoken by millions of people in the state of Kerala and the union territories of Lakshadweep and Pondicherry in India. It is written mostly in clockwise direction and consists of loops and curves. The method aims at training a simple neural network with three layers using backpropagation algorithm.
Freeman codes are used to represent each character as feature vector. These feature vectors act as inputs to the network during the training and testing phases of the neural network. The output is the character expressed in the Unicode format.
Fragmentation of Handwritten Touching Characters in Devanagari ScriptZac Darcy
Character Segmentation of handwritten words is a difficult task because of different writing styles and
complex structural features. Segmentation of handwritten text in Devanagari script is an uphill task. The
occurrence of header line, overlapped characters in middle zone & half characters make the segmentation
process more difficultt. Sometimes, interline space and noise makes line fragmentation a difficult task.
Sometimes, interline space and noise makes line fragmentation a difficult task. Without separating the
touching characters, it will be difficult to identify the characters, hence fragmentation is necessary of the
touching characters in a word. So, we devised a technique, according to that first step will be
preprocessing of a word, than identify the joint points, form the bounding boxes around all vertical &
horizontal lines and finally fragment the touching characters on the basis of their height and width.
Fragmentation of handwritten touching characters in devanagari scriptZac Darcy
Character Segmentation of handwritten words is a difficult task because of different writing styles and
complex structural features. Segmentation of handwritten text in Devanagari script is an uphill task. The
occurrence of header line, overlapped characters in middle zone & half characters make the segmentation
process more difficultt. Sometimes, interline space and noise makes line fragmentation a difficult task.
Sometimes, interline space and noise makes line fragmentation a difficult task. Without separating the
touching characters, it will be difficult to identify the characters, hence fragmentation is necessary of the
touching characters in a word. So, we devised a technique, according to that first step will be
preprocessing of a word, than identify the joint points, form the bounding boxes around all vertical &
horizontal lines and finally fragment the touching characters on the basis of their height and width.
Study on a Hybrid Segmentation Approach for Handwritten Numeral Strings in Fo...inventionjournals
This paper presents a hybrid approach to segment single- or multiple-touching handwritten numeral strings in form document, the core of which is the combined use of foreground, background and recognition analysis. The algorithm first located some feature points on both the foreground and background skeleton images containing connected numeral strings in form document. Possible segmentation paths were then constructed by matching these feature points, with an unexpected benefit of removing useless strokes. Subsequently, all these segmentation paths were validated and ranked by a recognition-based analysis, where a well-trained two-stage classifier was applied to each separated digit image to obtain its reliability. Finally, by introducing a locally optimal strategy to accelerate the recognition process, the top ranked segmentation path survived to help make a decision on whether to accept or not. Experimental results show that the proposed method can achieve a correct segmentation rate of 96.2 percent on a large dataset collected by our own.
1. The document presents a methodology for recognizing isolated handwritten Devanagari numerals using structural and statistical features.
2. Key features extracted include whether the numeral has openings on the left, right, above or below, and the number of horizontal and vertical crossings.
3. The methodology achieves an average accuracy of 96.8% on a dataset of 500 numeral images collected from various individuals. Accuracy is highest for numerals 0, 6, 8 and 10 at 100%, while some similar numerals like 3 and 2 see more errors.
This document summarizes a research paper that proposes a new method for segmenting text lines in machine printed Telugu script documents. It begins by noting common challenges with existing horizontal projection profile methods for line segmentation when text lines overlap. The proposed method first estimates the top and bottom bounding lines of the inter-line space using statistical analysis of the horizontal profile. It then predicts the segmentation path, which is set at 70% of the inter-line space to follow character contours. When two possible paths exist, it analyzes the horizontal profile of a small sub-image to select the correct path. The method achieved a success rate of 99.1% on tested documents.
In This paper we presented new approach for cursive Arabic text recognition system. The objective is to propose methodology analytical offline recognition of handwritten Arabic for rapid implementation.The first part in the writing recognition system is the preprocessing phase is the preprocessing phase to prepare the data was introduces and extracts a set of simple statistical features by two methods : from a window which is sliding long that text line the right to left and the approach VH2D (consists in projecting every character on the abscissa, on the ordinate and the diagonals 45° and 135°) . It then injects the resulting feature vectors to Hidden Markov Model (HMM) and combined the two HMM by multi-stream approach.
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONijnlc
This document presents a multi-stream HMM approach for offline handwritten Arabic word recognition. It extracts two sets of features from each word using a sliding window approach and VH2D projection approach. These features are input to separate HMM classifiers, and the outputs are combined in a multi-stream HMM to provide more reliable recognition. The system is evaluated on 200 words, achieving a recognition rate of 83.8% using the multi-stream approach compared to 78.2% and 76.6% for the individual classifiers.
The technical study had been performed on
many foreign languages like Japanese; Chinese etc. but the
efforts on Indian ancient script is still immature. As the Modi
script language is ancient and cursive type, the OCR of it is still
not widely available. As per our knowledge, Prof. D.N.Besekar,
Dept. of Computer Science, Shri. Shivaji College of Science,
Akola had proposed a system for recognition of offline
handwritten MODI script Vowels. The challenges of
recognition of handwritten Modi characters are very high due
to the varying writing style of each individual. Many vital
documents with precious information have been written in
Modi and currently, these documents have been stored and
preserved in temples and museums. Over a period of time these
documents will wither away if not given due attention. In this
paper we propose a system for recognition of handwritten
Modi script characters; the proposed method uses Image
processing techniques and algorithms which are described
below.
General Terms
Preprocessing techniques: Gray scaling, Thresholding,
Boundary detection, Thinning, cropping, scaling, Template
generation. Other algorithms used- Average method, otsu
method, Stentiford method, Template-based matching method
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Offline Signiture and Numeral Recognition in Context of ChequeIJERA Editor
Signature is considered as one of the biometrics. Signature Verification System is required in almost all places where it is compulsory to authenticate a person or his/her credentials to proceed further transaction especially when it comes to bank cheques. For this purpose signature verification system must be powerful and accurate. Till date various methods have been used to make signature verification system powerful and accurate. Research here is related to offline signature verification. Shape Contexts have been used to verify whether 2 shapes are similar or not. It has been used for various applications such as digit recognition, 3D Object recognition, trademark retrieval etc. In this paper we present a modified version of shape context for signature verification on bank cheques using K-Nearest Neighbor classifier.
Font type identification of hindi printed documenteSAT Journals
This document summarizes an approach for identifying the font type in Hindi printed documents. It discusses preprocessing steps like noise removal, binarization, and skew correction. Features are extracted from text including typographical features like word height and width, and structural features like thickness of vertical lines. Five Hindi font types are classified using these extracted features, achieving 96-100% accuracy on test documents. Future work could involve word-level font identification and improving image quality before feature extraction and classification.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Nowadays character recognition has gained lot of attention in the field of pattern recognition due to its application in various fields. It is one of the most successful applications of automatic pattern recognition. Research in OCR is popular for its application potential in banks, post offices, office automation etc. HCR is useful in cheque processing in banks; almost all kind of form processing systems, handwritten postal address resolution and many more. This paper presents a simple and efficient approach for the implementation of OCR and translation of scanned images of printed text into machine-encoded text. It makes use of different image analysis phases followed by image detection via pre-processing and post-processing. This paper also describes scanning the entire document (same as the segmentation in our case) and recognizing individual characters from image irrespective of their position, size and various font styles and it deals with recognition of the symbols from English language, which is internationally accepted.
1) The document presents a Tamil character recognition system using Hilditch's algorithm and structural characteristics of characters.
2) It describes the image preprocessing steps of binarization, segmentation, bounding box detection, normalization and skeletonization.
3) Hilditch's algorithm is applied to obtain thin, connected skeletons of characters.
4) Moments-based features are extracted from skeletons for character recognition using a neural network classifier.
5) The system achieved a recognition accuracy of 99% for identified Tamil characters.
Image to Text Converter PPT. PPT contains step by step algorithms/methods to which we can convert images in to text , specially contains algorithms for images which contains human handwritting, can convert writting in to text, img to text.
The document discusses developing an optical character recognition (OCR) system for various Indian languages like Devnagri, Bengali, and Punjabi. It describes segmenting text images into lines, words, and individual characters. Machine learning models are trained to recognize characters. The project implements libraries for segmenting text into characters using machine vision and recognizing characters using machine learning. Rule-based approaches are also used for specific languages. Steps include preprocessing images, segmenting lines, words, removing decorative lines, dividing words into zones, and downsampling characters for recognition. Various classification algorithms like KNN, logistic regression, SVM, and neural networks are evaluated for character recognition.
IRJET- Classification of Hindi Maatras by Encoding SchemeIRJET Journal
This document presents a novel encoding scheme for classifying Hindi modifiers or matras. It describes existing techniques for modifier segmentation that use projection profiles and character heights. The proposed method segments modifiers using three encoding levels to assign a distinctive code for each modifier. For ascenders, the codes are based on the middle portion, skeletonized shape, and ends. For descenders, the codes consider width, last pixel, and lower right space. The method was tested on printed and handwritten modifiers, achieving over 90% accuracy for both ascenders and descenders. The encoding approach allows for direct classification of modifiers without feature extraction.
GEOMETRIC CORRECTION FOR BRAILLE DOCUMENT IMAGEScsandit
Braille system has been used by the visually impaired people for reading.The shortage of Braille
books has caused a need for conversion of Braille to text. This paper addresses the geometric
correction of a Braille document images. Due to the standard measurement of the Braille cells,
identification of Braille characters could be achieved by simple cell overlapping procedure. The
standard measurement varies in a scaled document and fitting of the cells become difficult if the
document is tilted. This paper proposes a line fitting algorithm for identifying the tilt (skew)
angle. The horizontal and vertical scale factor is identified based on the ratio of distance
between characters to the distance between dots. These are used in geometric transformation
matrix for correction. Rotation correction is done prior to scale correction. This process aids in
increased accuracy. The results for various Braille documents are tabulated.
This document discusses statistical feature extraction methods for isolated handwritten Gurumukhi script characters. It introduces Zernike and Pseudo-Zernike moment-based methods for extracting features from preprocessed and normalized Gurumukhi character images. Features are extracted at various moment orders and used to reconstruct the images to check accuracy. The document provides background on Gurumukhi script and discusses shape descriptors and image moments as methods for statistical feature extraction. Experimental results using Zernike and Pseudo-Zernike moments are presented.
Text detection and recognition in scene images or natural images has applications in computer
vision systems like registration number plate detection, automatic traffic sign detection, image retrieval
and help for visually impaired people. Scene text, however, has complicated background, blur image,
partly occluded text, variations in font-styles, image noise and ranging illumination. Hence scene text
recognition could be a difficult computer vision problem. In this paper connected component method
is used to extract the text from background. In this work, horizontal and vertical projection profiles,
geometric properties of text, image binirization and gap filling method are used to extract the text from
scene images. Then histogram based threshold is applied to separate text background of the images.
Finally text is extracted from images.
An exhaustive font and size invariant classification scheme for ocr of devana...ijnlc
The document presents a classification scheme for recognizing Devanagari characters that is invariant to font and size. It identifies the basic symbols that commonly appear in the middle zone of Devanagari text across different fonts and sizes. Through an analysis of over 465,000 words from various sources, it finds that 345 symbols account for 99.97% of text and aims to classify these into groups based on structural properties like the presence or absence of vertical bars. The proposed classification scheme is validated on 25 fonts and 3 sizes to demonstrate its font and size invariance.
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...acijjournal
Handwritten character recognition is conversion of handwritten text to machine readable and editable form. Online character recognition deals with live conversion of characters. Malayalam is a language spoken by millions of people in the state of Kerala and the union territories of Lakshadweep and Pondicherry in India. It is written mostly in clockwise direction and consists of loops and curves. The method aims at training a simple neural network with three layers using backpropagation algorithm.
Freeman codes are used to represent each character as feature vector. These feature vectors act as inputs to the network during the training and testing phases of the neural network. The output is the character expressed in the Unicode format.
Fragmentation of Handwritten Touching Characters in Devanagari ScriptZac Darcy
Character Segmentation of handwritten words is a difficult task because of different writing styles and
complex structural features. Segmentation of handwritten text in Devanagari script is an uphill task. The
occurrence of header line, overlapped characters in middle zone & half characters make the segmentation
process more difficultt. Sometimes, interline space and noise makes line fragmentation a difficult task.
Sometimes, interline space and noise makes line fragmentation a difficult task. Without separating the
touching characters, it will be difficult to identify the characters, hence fragmentation is necessary of the
touching characters in a word. So, we devised a technique, according to that first step will be
preprocessing of a word, than identify the joint points, form the bounding boxes around all vertical &
horizontal lines and finally fragment the touching characters on the basis of their height and width.
Fragmentation of handwritten touching characters in devanagari scriptZac Darcy
Character Segmentation of handwritten words is a difficult task because of different writing styles and
complex structural features. Segmentation of handwritten text in Devanagari script is an uphill task. The
occurrence of header line, overlapped characters in middle zone & half characters make the segmentation
process more difficultt. Sometimes, interline space and noise makes line fragmentation a difficult task.
Sometimes, interline space and noise makes line fragmentation a difficult task. Without separating the
touching characters, it will be difficult to identify the characters, hence fragmentation is necessary of the
touching characters in a word. So, we devised a technique, according to that first step will be
preprocessing of a word, than identify the joint points, form the bounding boxes around all vertical &
horizontal lines and finally fragment the touching characters on the basis of their height and width.
A Comprehensive Study On Handwritten Character Recognition Systemiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
This document provides a comprehensive review of existing works in offline handwritten character recognition. It discusses the three major stages of any character recognition system: preprocessing, feature extraction, and classification. For preprocessing, it describes techniques like binarization, filtering, and morphological operations that are used to improve image quality. For feature extraction, it discusses various methods used to represent characters, including global transformations, statistical representations, and geometrical/topological features. Wavelet transforms are highlighted as a commonly used feature extraction technique. Finally, it provides an overview of literature on methods used in each stage of offline handwritten character recognition systems.
Review of research on devnagari character recognitionVikas Dongre
This document summarizes research on Devnagari character recognition. It begins with an abstract discussing the progress of English character recognition and the need for further research on Indian languages like Devnagari. The document then reviews the stages of Devnagari optical character recognition systems, including pre-processing, segmentation, feature extraction, recognition, and post-processing. It discusses challenges in Devnagari recognition due to features of the script like connected characters. The document also reviews common techniques used at each stage of recognition systems and provides directions for future research.
This document presents a divide and conquer method for recognizing isolated handwritten Arabic characters. It divides Arabic characters into four groups based on the number of connected components in each character. Feature vectors are extracted containing directional angles between points on the character contour and information about dots. Four neural networks are used to classify each group. The method was tested on a dataset containing isolated Arabic characters and preliminary results show it outperforms using a single neural network, with an average recognition rate of 88.11% for the largest group. Further experiments on additional directional features are suggested to improve performance.
Devnagari handwritten numeral recognition using geometric features and statis...Vikas Dongre
This paper presents a Devnagari Numerical recognition method based on statistical
discriminant functions. 17 geometric features based on pixel connectivity, lines, line directions, holes,
image area, perimeter, eccentricity, solidity, orientation etc. are used for representing the numerals. Five
discriminant functions viz. Linear, Quadratic, Diaglinear, Diagquadratic and Mahalanobis distance are
used for classification. 1500 handwritten numerals are used for training. Another 1500 handwritten
numerals are used for testing. Experimental results show that Linear, Quadratic and Mahalanobis
discriminant functions provide better results. Results of these three Discriminants are fed to a majority
voting type Combination classifier. It is found that Combination classifier offers better results over
individual classifiers.
DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...Mohammad Liton Hossain
Character recognition technique, associates a symbolic identity with the image of the character, is an important area in pattern recognition and image processing. The principal idea here is to convert raw images (scanned from document, typed, pictured etcetera) into editable text like html, doc, txt or other formats. There is a very limited number of Bangla Character recognition system, if available they can’t recognize the whole alphabet set. Motivated by this, this paper demonstrates a MATLAB based Character Recognition system from printed Bangla writings. It can also compare the characters of one image file to another one. Processing steps here involved binarization, noise removal and segmentation in various levels, features extraction and recognition.
Script Identification In Trilingual Indian DocumentsCSCJournals
This paper presents a research work in identification of script from trilingual Indian documents. This paper proposes a classification algorithm based on structural and contour features. The proposed system identifies the script of languages like English, Tamil and Hindi. 300 word images of the above mentioned three scripts were tested and 98.6% accuracy was obtained. Performance comparison with various existing methods is discussed.
Off line system for the recognition of handwritten arabic charactercsandit
Recognition of handwritten Arabic text awaits accurate recognition solutions. There are many
difficulties facing a good handwritten Arabic recognition system such as unlimited variation in
human handwriting, similarities of distinct character shapes, and their position in the word. The
typical Optical Character Recognition (OCR) systems are based mainly on three stages,
preprocessing, features extraction and recognition.
In this paper, we present an efficient approach for the recognition of off-line Arabic handwritten
characters which is based on structural, Statistical and Morphological features from the main
body of the character and also from the secondary components. Evaluation of the accuracy of
the selected features is made. The system was trained and tested with CENPRMI dataset. The
proposed algorithm obtained promising results in terms of accuracy (success rate of 100% for
some letters at average 88%). In Comparable with other related works we find that our result is
the highest among others.
A Survey of Modern Character Recognition Techniquesijsrd.com
This document summarizes several modern techniques for handwritten character recognition. It discusses common feature extraction methods like statistical, structural and global transformation features. It then summarizes several papers that have proposed different techniques for handwritten character recognition, including using associative memory nets, moment invariants with support vector machines, neural networks, hidden markov models, gradient features, and multi-scale neural networks. The document concludes that neural networks are commonly used for training, and that feature extraction methods continue to be improved, but handwritten character recognition remains an active area of research.
An effective approach to offline arabic handwriting recognitionijaia
Segmentation is the most challenging part of the Arabic handwriting recognition, due to the unique
characteristics of Arabic writing that allows the same shape to denote different characters. In this paper,
an off-line Arabic handwriting recognition system is proposed. The processing details are presented in
three main stages. Firstly, the image is skeletonized to one pixel thin. Secondly, transfer each diagonally
connected foreground pixel to the closest horizontal or vertical line. Finally, these orthogonal lines are
coded as vectors of unique integer numbers; each vector represents one letter of the word. In order to
evaluate the proposed techniques, the system has been tested on the IFN/ENIT database, and the
experimental results show that our method is superior to those methods currently available.
Fuzzy rule based classification and recognition of handwritten hindiIAEME Publication
This document summarizes a research paper that proposes a fuzzy rule-based system for classifying and recognizing handwritten Hindi words. The system works in six stages: preprocessing, segmentation, normalization, classification, feature extraction, and recognition. In the classification stage, characters are classified into seven classes based on the presence, position, length, connectivity, and number of junction points of their vertical bars. Experimental results on 450 words written by 30 people showed the system achieved a classification and recognition rate of 92.02%.
Fuzzy rule based classification and recognition of handwritten hindiIAEME Publication
This document describes a fuzzy rule-based system for classifying and recognizing handwritten Hindi words. The system works in six stages: preprocessing, segmentation, normalization, classification, feature extraction, and recognition. Preprocessing includes binarization, thinning, slant correction, dilation, erosion, and filtering to prepare images for further processing. Classification uses fuzzy if-then rules based on the presence and position of vertical bars to classify characters into seven classes. Feature extraction identifies curves, lines, junction points and endpoints. The system was tested on 450 words written by 30 people, achieving a recognition rate of 92.02%.
Segmentation of Handwritten Chinese Character Strings Based on improved Algor...ijeei-iaes
Algorithm Liu attracts high attention because of its high accuracy in segmentation of Japanese postal address. But the disadvantages, such as complexity and difficult implementation of algorithm, etc. have an adverse effect on its popularization and application. In this paper, the author applies the principles of algorithm Liu to handwritten Chinese character segmentation according to the characteristics of the handwritten Chinese characters, based on deeply study on algorithm Liu.In the same time, the author put forward the judgment criterion of Segmentation block classification and adhering mode of the handwritten Chinese characters.In the process of segmentation, text images are seen as the sequence made up of Connected Components (CCs), while the connected components are made up of several horizontal itinerary set of black pixels in image. The author determines whether these parts will be merged into segmentation through analyzing connected components. And then the author does image segmentation through adhering mode based on the analysis of outline edges. Finally cut the text images into character segmentation. Experimental results show that the improved Algorithm Liu obtains high segmentation accuracy and produces a satisfactory segmentation result.
This paper presents an approach to recognize off-line Bangla numeral. Today there are many OCR used to recognize
Bangla numeral. The recognition of handwritten character is still a challenging work in the field of pattern recognition. Numeral
recognition in pattern recognition is the process to identify the given character according to the predefined character set. The difficulties
of recognition of handwritten Bangla numeral are that they are different in shapes and sizes which are much curved in nature. . We try
to establish a process to recognize such handwritten Bangla numerals having different shape and size. The input scanned image is first
to be binarized. Then we have segmented all the ten digits of Bangla numerals to identify each and individual digit from a scanned
image. We have used line segmentation to extract the feature from each numeral based on templates. A high correlation coefficient
method provides a successful match between the test data and training data.
Dimensionality Reduction and Feature Selection Methods for Script Identificat...ITIIIndustries
The goal of this research is to explore effects of dimensionality reduction and feature selection on the problem of script identification from images of printed documents. The kadjacent segment is ideal for this use due to its ability to capture visual patterns. We have used principle component analysis to reduce the size of our feature matrix to a handier size that can be trained easily, and experimented by including varying combinations of dimensions of the super feature set. A modular
approach in neural network was used to classify 7 languages – Arabic, Chinese, English, Japanese, Tamil, Thai and Korean.
A Novel Approach to Recognize Handwritten Gujarati Digits.pdfSamantha Martinez
This document proposes a novel approach for recognizing handwritten Gujarati digits using structural features. It discusses preprocessing steps like noise removal, binarization, segmentation and thinning. For feature extraction, it uses structural features like bounded regions, number of endpoints, and endpoint positions. A decision tree is used for classification, with the structural features forming the branching conditions. Challenges include variability in writing styles and possibility of overlap or overwritten digits. The proposed approach aims to provide a simple classification method using key structural properties of each digit.
Similar to Critical Review on Off-Line Sinhala Handwriting Recognition (20)
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
20240609 QFM020 Irresponsible AI Reading List May 2024
Critical Review on Off-Line Sinhala Handwriting Recognition
1. Off-Line Sinhala Handwriting Recognition
D.G.A.M.Wijayarathna (144193K)
Faculty of Information Technology
University of Moratuwa
amwijayarathna@gmail.com
Abstract
Handwriting recognition one of the latest technologies in this era.
Using handwriting recognition technologies computer can identify
the characters and letters that written in a paper or printed. Sinhala
Handwriting Recognition is one of the trending topic these day. This
paper is intended to produce a Literature Review paper to provide a
summary of literary sources on “Off-Line Sinhala Handwriting
Recognition” while identifying and evaluating issues to present
solutions, improvisations for further research in this technology.
Key Terms- Handwriting Recognition, Sinahala, Segmentation, Data
Collection, Grouping.
I. INTRODUCTION
Sinhala language is the mainly used language of Sri Lanka. It is
one of the official languages of the country. Majority of the Sri
Lankan population speak Sinhala. It is essentially the language
through which they communicate for their daily activities as well
as most of the official purposes.
The language originated from the Indo-Aryan group of
languages along with Sanskruth, Hindi and Benghali. In Sinhala
language, the character set is represented by 16 vowels, 40
consonants, 2 semi-consonants and 13 consonant modifiers [1].
Handwriting Recognition is a rapidly increasing and growing
technology in present technical world. So the Sinhala
Handwriting recognition is a much needed technology for Sri
Lankan because sinhala is more familiar to sri lankan people
than any other language. But there are very few tools for sinhala
handwriting recognition.
So far Sinhala Handwriting Recognition done by Zone Based
Feature Extraction[1] ,Hidden Markov Models[2], computational
pattern recognition methods such as Artificial Neural
Networks(ANN)[3] and some other techniques. Also, Sinhala
Handwriting Recognition techniques can be offline or online.
Off-Line Recognition of handwriting has many practical usage in
many areas of banking, census, mail sorting, commerce and
such. [3].
Sinhala letters are written in the left to right direction. Most of
the letters are curve shaped. And all these characters are written
within three horizontal layers. They are upper layer, middle
and lower layer [Figure1]. Some characters are written within
these three layers, some of them are written only in the middle
layer and another set of characters are written in the middle
layer and in other two layers but it is optional to occupy both
upper and lower layers for these characters.
Figure1: Sinhala characters written with layers[2]
Generally, 5 main stages of Character Recognition could be
identified as follows,
1) Preprocessing and data collecting;
2) Character Segmentation
3) Representation
4) Training and recognition
5) Post processing
These stages are reviewed in following sub topics,
II. PREPROCESSING DATACOLLECTION
In every research paper, the data collection is done by using A4
size papers. Every paper contain 10 lines of sample
handwritings. So every paper contains more than 20 words
written in lines. There are vowels, consonants and modifiers in
collected samples.
The documents will be scanned in a resolution of various dots
per inch(dpi) that suits and binarized using an adjustable
thresholding technique. Then the outputs were stored as gray
scale images. The thresholding technique is used to reduce the
noise and extract the handwriting.
2. Chamikara et al. Used methods for character preprocessing[10],
i) RGB to Grey
ii) Noise Removal
iii) Image binarization
iv) Image skeletonization
III. SEGMENTATION
After the pre processing of the image, character segmentation is
done by using different kind of image processing algorithms.
Generated images of segmented characters are used for character
classification. [Figure 2].
Figure 2: Character segmentation flow chart[1]
A. Segment Text Lines
Segment the lines is the first thing need to be done. gaps between
the lines can be clearly identified using the horizontal projection
profile of the entire image [Figure 3]. In the projection profile
graph the valleys represent the segmentation between the text
lines. If the graph consist of consecutive numbers of 0 values,
then it is considered as the boundary of the text.
Figure 3: Line segmentation using the horizontal projection
profile[5]
B. Segment Horizontal Characters
Segmenting horizontal characters can be done by the vertical
projection profile. From this the boundaries between characters
and words can be identified. But if the characters are
overlapped then this method won't be enough for the
segmentation [Figure 4a & 4b].
Figure 4a: Word segmentation using the vertical projection
profile[1]
Figure 4b: Character segmentation using the vertical
projection profile[5]
C. Identifying Overlapping Characters
Average character width can be calculate from the image width
and the number of characters in the image [Equation 1]
Average Image width [1]
character = --------------------
width number of characters
So the overlapping characters can be identifying by the
following algorithm.[Equation 2]
If ( ( 3 x Average Width /2 ) > Width )
Touching Character [2]
Else
Segmented Character
3. D. Segment Overlapping Characters
Darmapala et al. [1] used contours to segment overlapping
characters[Figure5].
Figure 5. Segment characters using contours[1]
And the Karunanayaka et al.[3] used Water Reservoir Concept
that is more complicate method than the earlier[Figure6].
Figure 6. Apply water reservoir concept in touching
character[3]
IV. GROUPING
Segmented characters are grouping in to four main groups
according to the character's layer structure[6]. The characters are
visualizing in the three layer structure[Figure7].
Figure 7. Three layered structure of the text line[6]
Character distances from the middle of the middle layer to the
upper and lower limits and character heights are considered in
the classification.
V. REPRESENTATION
One of the most important elements in a Handwriting
recognition system is image representation. Gray-level / binary
images are being fed to a recognizing method in its simplest
stage. However, in order to prevent additional complexity and
also to increase the accuracy of algorithms, a much compact and
quality representation is needed in many recognition systems. A
set of features is being extracted for each and every category that
help to separate and identify it from other classes while
remaining constant and unchanging to characteristic variations
within the class for this cause. [7]. Character representation
methods can be separate in to three main groups as following,
1) Global Transformation and Series Expansion:
Frequent methods of transformation and series expansion used
in the Character Recognition are comprised in the following.
a) Fourier Transforms
b) Gabor Transform
c) Wavelets
d) Moments
e) Karhunen–Loeve Expansion
2) Statistical Representation:
In this method character representation is done by using
variation of the characters to certain degree and convert in to
statistical dissemination. There are three methods for this
a) Zoning
b) Crossings & Distances
c) Projections
3) Geometrical and Topological Representation:
Topological and geometrical attributes with huge ability to style
variations anddistortions are used to represent numerous local
and global properties of characters.
a) Extracting & Counting Topological Structures
b)Measuring & Approximating the Geometrical
Properties
c) Coding
d) Graphs and Trees
But when it comes to Sinhala Handwriting Recognition it
mainly use Statistical Representation methods.
In Sinhala Handwriting Recognition most of researchers used
zone base feature extraction method for character
representation.
a. Feature Extraction
The aim of the feature extraction is to detect the properties of a
character which can identified uniquely and from that
4. maximizing the recognition rate. After feature extraction these
recognized features are used as inputs for training and
recognition systems[8].
Dharmapala et al. used feature extraction method for character
representation[1].
Figure 8. Classification flowchart for Feature Extraction
As shown in figure 8, first characters are differentiate into three
main zones using their height and width.
Figure 9: Character unit within the zones
The zoned characters are again divided into three horizontal
blocks and two vertical blocks. These are use for making
horizontal and vertical projection profiles of the characters. They
are the feature vector of the character. After these feature vecters
are extracted they use for training ANNs In recognition stage.
VI. TRAINING AND RECOGNITION TECHNIQUES
Character Recognition systems use pattern recognition methods
that allocate unknown samples to a predefined class. A lot of
character recognition methods could be studied under four
themes of pattern recognition[9].
These are the main techniques for off-line character recognition
1.Template matching
2 Statistical techniques
3 Structural techniques
4 Artificial Neural Networks (ANNs)
But in Sinhala Handwriting Recognition methodologies still
there are two main methods for recognition. They are briefed
below.
a. Statistical Techniques
Statistical decision theory concerns about the functions of
statistical decisions and a set of optimality criteria, that
maximize the probability of the pattern observed, given the
model of a certain class [38]. Statistical techniques are mostly
based on three main assumptions.
2.a Character Recognition Using Hidden Markov Model
Hewawitharana et al. used Hidden Markov Model to sinhala
handwriting recognition. Mixed cursive is the most general and
difficult type handwriting style, and in view of its automatic
recognition HMM is used. HMM is a doubly randomly
determined process that is not detectable under some
observations but it can be processed due to its stochastic
approach. It contains a set of states linked to each other by
transition with a expectation while the detected process be
formed of a set of operations .
HMM calculates the hidden states chain which has their basis
on the observation chain of Counter & Viterbi algorithm. Both
the algorithms have the most likely result and work in a
specific way[2].
b. Artificial Neural Networks (ANNs)
An Artificial Neural Network is a gathered set of artificial
neurons which can train to learn and use for information
processing techniques. ANNs mostly using computational
model or a mathematical model. These systems are adaptive so
they can change there formation regarding to their training and
information.
Chamikara et al. used ANN for sinhala handwriting
recognition[10]. It called as Fuzzy Neural Hybrid method. In
his segmentation method it is done by character separated to six
sections. Six neurons are use for recognize these sections
separately.
5. Figure 11. Architecture of the Fuzzy Neural Hybrid method
V. DISCUSSION
Sinhala language is the majority language of Sri Lanka. Because
of its complexity the Sinhala Handwriting Recognition is being
more complicated than the other languages. The sinhala
handwriting recognition is one of the most required system for
the society. And from this off-line handwriting recognition is
also much needed. Because as a developing country Sri Lanka
still using handwritten methods for several official and unofficial
works. Such as cheques and post cards. In this paper we have
discussed about many handwriting recognition methods and
procedures. They all have there own advantages and
disadvantages. So far the most difficult part in Sinhala
Handwriting Recognition is to identifying the overlapping and
the touched characters. Not as in other languages in sinhala it is
more complected because of its curved shape.
So far this field is approached by Zone Based Feature
Extraction[1] ,Hidden Markov Models[2], computational pattern
recognition such as artificial neural networks[3] and some other
techniques. All these techniques has common procedure. They
are,
1. Preprocessing
2. Segmentation
3. Feature Extraction
4. Recognition
Noise removing and binarization of the document are done in
prepossessing stage and after that characters are segmented. The
feature extraction stage is helps to maximize the recognition.
After that segmented characters are used to recognition stage.
In these recognition methods the feature extraction resulted
94% accuracy and Hidden Markove Method shows 64%
accuracy and ANNs technique shows 94% accuracy with
unique shapes and 75% rate with confusing shapes.
VI. CONCLUSION
The Sinhala Handwriting recognition is a much needed
technology for Sri Lankan because sinhala is more familiar to
sri lankan people than any other language. But there are very
few tools for sinhala handwriting recognition. This critical
review paper explains the data collection methods, character
segmentation, character grouping, feature extraction and
character recognition methods. The character segmentation
must be improved mainly because it is the most difficult field in
this technology. Among these techniques I think developing
ANN methods is more reliable because it is an upcoming field
in this era.
ACKNOWLEDGMENT
I would like to thank Dr. Lochandaka Ranathunga for guiding
and motivating me to complete this critical review paper. My
thanks and appreciations also go to my colleagues who have
willingly helped me out with their abilities to make this review
success..
REFERENCES
[1] K. A. K. N. D. Dharmapala, W. P. M. V. Wijesooriya, C. P.
Chandrasekara, U. K. A. U. Rathnapriya, and L. Ranathunga,
“Sinhala Handwriting Recognition Mechanism Using Zone
Based Feature Extraction,” UoM IR. [Online]. Available:
http://dl.lib.mrt.ac.lk/handle/123/12501. [Accessed: 04-May-
2017].
[2] S. Hewavitharana, H. C. Fernando, and N. D. Kodikara,
“Off-line Sinhala Handwriting Recognition using Hidden
Markov Models,” ResearchGate. [Online]. Available:
https://www.researchgate.net/publication/2550174_Off-
line_Sinhala_Handwriting_Recognition_using_Hidden_Marko
v_Models. [Accessed: 04-May-2017].
6. [3] M.L.M Karunanayaka, N.D Kodikara and G.D.S.P
Wimalaratne “Off Line Sinhala Handwriting Recognition with
an Application for Postal City Name Recognition,” Off Line
Sinhala Handwriting Recognition with an Application for Postal
City Name Recognition | International Conference on Advances
in ICT for Emerging Regions. [Online]. Available:
http://www.icter.org/conference/icter2016/?
q=iitc2004%2Fabstract%2FIITC-2004p4. [Accessed: 04-May-
2017].
[4] I. Manamperi, K. K. Wijesinghe, J. C. K. Gamage, D. S. K.
Priyarathne, and S. M. I. P. B. Samarakoon, “Sinhala Online
Handwriting Recognition System,” SLIIT: Home, 16-Jun-2014.
[Online]. Available: http://dspace.sliit.lk/handle/123456789/126.
[Accessed: 04-May-2017].
[5] S. Hewavitharana and N. D. Kodikara, “A Statistical
Approach to Sinhala Handwriting Recognition ,” ResearchGate.
[Online]. Available:
https://www.researchgate.net/publication/268302652_A_Statistic
al_Approach_to_Sinhala_Handwriting_Recognition. [Accessed:
10-Jun-2017].
[6] B. Jayasekara and L. Udawatta, “Non-Cursive Sinhala
Handwritten Script Recognition: A Genetic Algorithm
BasedAlphabet Training Approach,” Department of Electronic
and Telecommunication Engineering. [Online].
Available:www.ent.mrt.ac.lk/iml/ICIA2005/Papers/SL013CRC.p
df. [Accessed: 19-Jun-2017].
[7] R. Goswami, “A Review on Character Recognition
Techniques,” International Journal of Computer Applications -
IJCA. [Online]. Available:
http://www.ijcaonline.org/archives/volume83/number7/14460-
2737. [Accessed: 15-Aug-2017].
[8] H. Khandelwal, S. Gupta, and A. K. Jain, “Review of
Offline Handwriting Recognition Techniques in the fields of
HCR and OCR,” International Journal of Computer Trends
and Technology. [Online]. Available:
http://www.ijcttjournal.org/archives/ijctt-v47p123. [Accessed:
15-Aug-2017].
[9] N. Arica and F. T. Yarman-Vural, “An overview of character
recognition focused on off-line handwriting ,”
http://ieeexplore.ieee.org. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=941845.
[Accessed: 15-Aug-2017].
[10] M. Chamikara, S. R. Kodituwakku, A. A. C. A.
Jayathilake, and K. R. Wijeweera, “Fuzzy Neural Hybrid
Method for Sinhala Character Recognition,” Academia.edu.
[Online].
Available:http://www.academia.edu/8740975/Fuzzy_Neural_H
ybrid_Method_for_Sinhala_Character_Recognition
http://www.academia.edu/8740975/Fuzzy_Neural_Hybrid_Met
hod_for_Sinhala_Character_Recognition. [Accessed: 10-Aug-
2017].