In this paper I propose a scheme for “Online Bangla Handwritten Compound Word Recognition” based on segmentation of word into its constituent characters with more accuracy. The goal of this Paper is to develop a system for segmentation of Bengali Compound Word into its constituent characters or basic strokes and then to recognize each character individually based on stroke generation, thus the recognizer can recognize the entire word. I
achieved the correct segmentation rate of 87% and the overall recognition rate of 73% on a dataset of 4200 Bangla Compound Words.
SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXTcscpconf
Optical Character Recognition (OCR) is one of the fundamental research areas of image processing and pattern recognition field. The performance accuracy of an OCR system depends on the proper segmentation of the characters. This paper is concerned with the segmentation of printed bangla characters without modifiers for optical character recognition (OCR) system. The basic steps needed for developing an OCR system also have been discussed.
Image to Text Converter PPT. PPT contains step by step algorithms/methods to which we can convert images in to text , specially contains algorithms for images which contains human handwritting, can convert writting in to text, img to text.
Numeral recognition is an important research direction in field of pattern recognition, and it has
broad application prospects. Aiming at four arithmetic operations of general printed formats, this article
adopts a multiple hybrid recognition method and is applied to automatically calculating. This method mainly
uses BP neural network and template matching method to distinguish the numerals and operators, in order
to increase the operation speed and recognition accuracy. Sample images of four arithmetic operations are
extracted from printed books, and they are used for testing the performance of proposed recognition
method. The experiments show that the method provides correct recognition rate of 96% and correct
calculation rate of 89%.
Scene text recognition in mobile applications by character descriptor and str...eSAT Journals
Abstract
Camera-based scene images usually have complex background filled with non-text objects in multiple shapes and colors. The existing system is sensitive to font scale changes and background interference. The main focusof this system is on two character recognition methods. In text detection, previously proposed algorithms are used to search for regions of text strings. Proposed system uses character descriptor which is effective to extract representative and discriminative text features for both recognition schemes. The local features descriptor HOG is compatible with all above key point detectors. Our method of scene text recognition from detected text regions is compatible with the application of mobile devices. Proposedsystem accurately extracts text from natural scene image in presence of background interference.The demo system gives us details of algorithm design and performance improvements of scene text extraction. It is ableto detect text region of text strings from cluttered and recognize characters in the text regions.
Keywords: Scene text detection, scene text recognition, character descriptor, stroke configuration, text understanding, text retrieval.
Text detection and recognition from natural sceneshemanthmcqueen
Text characters in natural scenes and surroundings provide us with valuable information about the place and even provide us with some legal/important information. Hence it’s very important for us to detect such text and recognise them which helps a lot. But , it’s not really easy to recognize those text information because of the diverse backgrounds and fonts used for the text. In this paper, a method is proposed to extract the text information from the surroundings. First, a character descriptor is designed with existing standard detectors and descriptors. Then, character structure is modeled at each character class by designing stroke configuration maps.In natural scenes , the text part is generally found on nearby sign boards and other objects. The extraction of such text is difficult because of noisy backgrounds and diverse fonts and text sizes. But many applications have been proven to be efficient in extraction of text from surroundings. For this , the method of text extraction is divided into two processes;
Text detection
Text recognition
SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXTcscpconf
Optical Character Recognition (OCR) is one of the fundamental research areas of image processing and pattern recognition field. The performance accuracy of an OCR system depends on the proper segmentation of the characters. This paper is concerned with the segmentation of printed bangla characters without modifiers for optical character recognition (OCR) system. The basic steps needed for developing an OCR system also have been discussed.
Image to Text Converter PPT. PPT contains step by step algorithms/methods to which we can convert images in to text , specially contains algorithms for images which contains human handwritting, can convert writting in to text, img to text.
Numeral recognition is an important research direction in field of pattern recognition, and it has
broad application prospects. Aiming at four arithmetic operations of general printed formats, this article
adopts a multiple hybrid recognition method and is applied to automatically calculating. This method mainly
uses BP neural network and template matching method to distinguish the numerals and operators, in order
to increase the operation speed and recognition accuracy. Sample images of four arithmetic operations are
extracted from printed books, and they are used for testing the performance of proposed recognition
method. The experiments show that the method provides correct recognition rate of 96% and correct
calculation rate of 89%.
Scene text recognition in mobile applications by character descriptor and str...eSAT Journals
Abstract
Camera-based scene images usually have complex background filled with non-text objects in multiple shapes and colors. The existing system is sensitive to font scale changes and background interference. The main focusof this system is on two character recognition methods. In text detection, previously proposed algorithms are used to search for regions of text strings. Proposed system uses character descriptor which is effective to extract representative and discriminative text features for both recognition schemes. The local features descriptor HOG is compatible with all above key point detectors. Our method of scene text recognition from detected text regions is compatible with the application of mobile devices. Proposedsystem accurately extracts text from natural scene image in presence of background interference.The demo system gives us details of algorithm design and performance improvements of scene text extraction. It is ableto detect text region of text strings from cluttered and recognize characters in the text regions.
Keywords: Scene text detection, scene text recognition, character descriptor, stroke configuration, text understanding, text retrieval.
Text detection and recognition from natural sceneshemanthmcqueen
Text characters in natural scenes and surroundings provide us with valuable information about the place and even provide us with some legal/important information. Hence it’s very important for us to detect such text and recognise them which helps a lot. But , it’s not really easy to recognize those text information because of the diverse backgrounds and fonts used for the text. In this paper, a method is proposed to extract the text information from the surroundings. First, a character descriptor is designed with existing standard detectors and descriptors. Then, character structure is modeled at each character class by designing stroke configuration maps.In natural scenes , the text part is generally found on nearby sign boards and other objects. The extraction of such text is difficult because of noisy backgrounds and diverse fonts and text sizes. But many applications have been proven to be efficient in extraction of text from surroundings. For this , the method of text extraction is divided into two processes;
Text detection
Text recognition
SCENE TEXT RECOGNITION IN MOBILE APPLICATION BY CHARACTER DESCRIPTOR AND STRU...Cheriyan K M
In text detection, our
previously proposed algorithms are applied to obtain text regions
from scene image. First, we design a discriminative character
descriptor by combining several state-of-the-art feature detectors
and descriptors. Second, we model character structure at each
character class by designing stroke configuration maps.
This paper presents an approach to recognize off-line Bangla numeral. Today there are many OCR used to recognize
Bangla numeral. The recognition of handwritten character is still a challenging work in the field of pattern recognition. Numeral
recognition in pattern recognition is the process to identify the given character according to the predefined character set. The difficulties
of recognition of handwritten Bangla numeral are that they are different in shapes and sizes which are much curved in nature. . We try
to establish a process to recognize such handwritten Bangla numerals having different shape and size. The input scanned image is first
to be binarized. Then we have segmented all the ten digits of Bangla numerals to identify each and individual digit from a scanned
image. We have used line segmentation to extract the feature from each numeral based on templates. A high correlation coefficient
method provides a successful match between the test data and training data.
Mixed Language Based Offline Handwritten Character Recognition Using First St...CSCJournals
Artificial Neural Network is an artificial representation of the human brain that tries to simulate its learning process. To train a network and measure how well it performs, an objective function must be defined. A commonly used performance criterion function is the sum of squares error function. Full end-to-end text recognition in natural images is a challenging problem that has recently received much attention in computer vision and machine learning. Traditional systems in this area have relied on elaborate models that incorporate carefully hand-engineered features or large amounts of prior knowledge. Language identification and interpretation of handwritten characters is one of the challenges faced in various industries. For example, it is always a big challenge in data interpretation from cheques in banks, language identification and translated messages from ancient script in the form of manuscripts, palm scripts and stone carvings to name a few. Handwritten character recognition using Soft computing methods like Neural networks is always a big area of research for long time and there are multiple theories and algorithms developed in the area of neural networks for handwritten character recognition.
NLP Project: Machine Comprehension Using Attention-Based LSTM Encoder-Decoder...Eugene Nho
Machine comprehension remains a challenging open area of research. While many question answering models have been explored for existing datasets, little work has been done with the newly released MS MARCO dataset, which mirrors the reality much more closely and poses many unique challenges. We explore an end-to-end neural architecture with attention mechanisms for comprehending relevant information and generating text answers for MS MARCO.
Devnagari document segmentation using histogram approachVikas Dongre
Document segmentation is one of the critical phases in machine recognition of any language. Correct
segmentation of individual symbols decides the accuracy of character recognition technique. It is used to
decompose image of a sequence of characters into sub images of individual symbols by segmenting lines and
words. Devnagari is the most popular script in India. It is used for writing Hindi, Marathi, Sanskrit and
Nepali languages. Moreover, Hindi is the third most popular language in the world. Devnagari documents
consist of vowels, consonants and various modifiers. Hence proper segmentation of Devnagari word is
challenging. A simple histogram based approach to segment Devnagari documents is proposed in this paper.
Various challenges in segmentation of Devnagari script are also discussed.
System of “Analysis of Intersections Paths” for Signature RecognitionCSCJournals
In today\'s world, the electronic city which is the offspring of the development of the information world, paves the way for a round-the-clock interaction among computers and networks. The planners of these electronic cities are mostly concerned about the accuracy and security of the exchanged information. In order to elevate security and raise speed and accuracy in the reviewing of network performance and the dependable identification of persons involved in electronic operations, recognizing the accuracy of the electronic signature is deemed absolutely essential. In this article, a system named \"Analysis of Intersections\" has been utilized for the accurate recognition of the electronic signature. Of important features of this system are the utilization of simple data structures such as array, stack, and list and determination of the sensitivity level for recognizing the accuracy of the signature by setting an error percentage for the size and recognition of the shape. An accuracy recognition test was performed on 15 samples of 150 types of signatures using \"Analysis of Intersections\". Findings indicated that this system showed an accurate recognition of 2,220 out of 2,250 signatures, indicating an applicability of 98.66 percent.
With so much of our lives computerized, it is vitally important that machines and humans can understand one another and pass information back and forth. Mostly computers have things their way we have to & talk to them through relatively crude devices such as keyboards and mice so they can figure out what we want them to do. However, when it comes to processing more human kinds of information, like an old-fashioned printed book or a letter scribbled with a fountain pen, computers have to work much harder. That is where optical character recognition (OCR) comes in. Here we process the image, where we apply various pre-processing techniques like desk wing, binarization etc. and algorithms like Tesseract to recognize the characters and give us the final document. T.Gnana Prakash | K. Anusha"Text Extraction from Image using Python" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-6 , October 2017, URL: http://www.ijtsrd.com/papers/ijtsrd2501.pdf http://www.ijtsrd.com/computer-science/simulation/2501/text-extraction-from-image-using-python/tgnana-prakash
The term Machine Learning was coined by Arthur Samuel in 1959, an American pioneer in the field of computer gaming and artificial intelligence, and stated that “it gives computers the ability to learn without being explicitly programmed”. Machine Learning is the latest buzzword floating around. It deserves to, as it is one of the most interesting subfields of Computer Science. So what does Machine Learning really mean? Let’s try to understand Machine Learning
Nowadays character recognition has gained lot of attention in the field of pattern recognition due to its application in various fields. It is one of the most successful applications of automatic pattern recognition. Research in OCR is popular for its application potential in banks, post offices, office automation etc. HCR is useful in cheque processing in banks; almost all kind of form processing systems, handwritten postal address resolution and many more. This paper presents a simple and efficient approach for the implementation of OCR and translation of scanned images of printed text into machine-encoded text. It makes use of different image analysis phases followed by image detection via pre-processing and post-processing. This paper also describes scanning the entire document (same as the segmentation in our case) and recognizing individual characters from image irrespective of their position, size and various font styles and it deals with recognition of the symbols from English language, which is internationally accepted.
presentation from my thesis defense on text summarization, discusses already existing state of art models along with efficiency of AMR or Abstract Meaning Representation for text summarization, we see how we can use AMRs with seq2seq models. We also discuss other techniques such as BPE or Byte Pair Encoding and its effectiveness for the task. Also we see how data augmentation with POS tags and AMRs effect the summarization with s2s learning.
Handwriting character recognition (HCR) is the ability of a computer to receive and interpret handwritten input. Handwritten Character Recognition is one of the active and challenging research areas in the field of Pattern Recognition. Pattern recognition is a process that taking in raw data and making an action based on the category of the pattern. HCR is one of the well-known applications of pattern recognition. Handwriting recognition especially for Indian languages is still in infant stage because not much work has been done it. This paper discuss about an idea to recognize Kannada vowels using chain code features. Kannada is a South Indian language. For any recognition system, an important part is feature extraction. A proper feature extraction method can increase the recognition ratio. In this paper, a chain code based feature extraction method is investigated for developing HCR system. Chain code is working based on 4-neighborhood or 8–neighborhood methods. Chain code is a sequence of code directions of a character and connection to a starting point which is often used in image processing. In this paper, 8–neighborhood method has been implemented which allows generation of eight different codes for each character. These codes have been used as features of the character image, which have been later on used for training and testing for K-Nearest Neighbor (KNN) classifiers. The level of accuracy reached to 100%.
Study on a Hybrid Segmentation Approach for Handwritten Numeral Strings in Fo...inventionjournals
This paper presents a hybrid approach to segment single- or multiple-touching handwritten numeral strings in form document, the core of which is the combined use of foreground, background and recognition analysis. The algorithm first located some feature points on both the foreground and background skeleton images containing connected numeral strings in form document. Possible segmentation paths were then constructed by matching these feature points, with an unexpected benefit of removing useless strokes. Subsequently, all these segmentation paths were validated and ranked by a recognition-based analysis, where a well-trained two-stage classifier was applied to each separated digit image to obtain its reliability. Finally, by introducing a locally optimal strategy to accelerate the recognition process, the top ranked segmentation path survived to help make a decision on whether to accept or not. Experimental results show that the proposed method can achieve a correct segmentation rate of 96.2 percent on a large dataset collected by our own.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
SCENE TEXT RECOGNITION IN MOBILE APPLICATION BY CHARACTER DESCRIPTOR AND STRU...Cheriyan K M
In text detection, our
previously proposed algorithms are applied to obtain text regions
from scene image. First, we design a discriminative character
descriptor by combining several state-of-the-art feature detectors
and descriptors. Second, we model character structure at each
character class by designing stroke configuration maps.
This paper presents an approach to recognize off-line Bangla numeral. Today there are many OCR used to recognize
Bangla numeral. The recognition of handwritten character is still a challenging work in the field of pattern recognition. Numeral
recognition in pattern recognition is the process to identify the given character according to the predefined character set. The difficulties
of recognition of handwritten Bangla numeral are that they are different in shapes and sizes which are much curved in nature. . We try
to establish a process to recognize such handwritten Bangla numerals having different shape and size. The input scanned image is first
to be binarized. Then we have segmented all the ten digits of Bangla numerals to identify each and individual digit from a scanned
image. We have used line segmentation to extract the feature from each numeral based on templates. A high correlation coefficient
method provides a successful match between the test data and training data.
Mixed Language Based Offline Handwritten Character Recognition Using First St...CSCJournals
Artificial Neural Network is an artificial representation of the human brain that tries to simulate its learning process. To train a network and measure how well it performs, an objective function must be defined. A commonly used performance criterion function is the sum of squares error function. Full end-to-end text recognition in natural images is a challenging problem that has recently received much attention in computer vision and machine learning. Traditional systems in this area have relied on elaborate models that incorporate carefully hand-engineered features or large amounts of prior knowledge. Language identification and interpretation of handwritten characters is one of the challenges faced in various industries. For example, it is always a big challenge in data interpretation from cheques in banks, language identification and translated messages from ancient script in the form of manuscripts, palm scripts and stone carvings to name a few. Handwritten character recognition using Soft computing methods like Neural networks is always a big area of research for long time and there are multiple theories and algorithms developed in the area of neural networks for handwritten character recognition.
NLP Project: Machine Comprehension Using Attention-Based LSTM Encoder-Decoder...Eugene Nho
Machine comprehension remains a challenging open area of research. While many question answering models have been explored for existing datasets, little work has been done with the newly released MS MARCO dataset, which mirrors the reality much more closely and poses many unique challenges. We explore an end-to-end neural architecture with attention mechanisms for comprehending relevant information and generating text answers for MS MARCO.
Devnagari document segmentation using histogram approachVikas Dongre
Document segmentation is one of the critical phases in machine recognition of any language. Correct
segmentation of individual symbols decides the accuracy of character recognition technique. It is used to
decompose image of a sequence of characters into sub images of individual symbols by segmenting lines and
words. Devnagari is the most popular script in India. It is used for writing Hindi, Marathi, Sanskrit and
Nepali languages. Moreover, Hindi is the third most popular language in the world. Devnagari documents
consist of vowels, consonants and various modifiers. Hence proper segmentation of Devnagari word is
challenging. A simple histogram based approach to segment Devnagari documents is proposed in this paper.
Various challenges in segmentation of Devnagari script are also discussed.
System of “Analysis of Intersections Paths” for Signature RecognitionCSCJournals
In today\'s world, the electronic city which is the offspring of the development of the information world, paves the way for a round-the-clock interaction among computers and networks. The planners of these electronic cities are mostly concerned about the accuracy and security of the exchanged information. In order to elevate security and raise speed and accuracy in the reviewing of network performance and the dependable identification of persons involved in electronic operations, recognizing the accuracy of the electronic signature is deemed absolutely essential. In this article, a system named \"Analysis of Intersections\" has been utilized for the accurate recognition of the electronic signature. Of important features of this system are the utilization of simple data structures such as array, stack, and list and determination of the sensitivity level for recognizing the accuracy of the signature by setting an error percentage for the size and recognition of the shape. An accuracy recognition test was performed on 15 samples of 150 types of signatures using \"Analysis of Intersections\". Findings indicated that this system showed an accurate recognition of 2,220 out of 2,250 signatures, indicating an applicability of 98.66 percent.
With so much of our lives computerized, it is vitally important that machines and humans can understand one another and pass information back and forth. Mostly computers have things their way we have to & talk to them through relatively crude devices such as keyboards and mice so they can figure out what we want them to do. However, when it comes to processing more human kinds of information, like an old-fashioned printed book or a letter scribbled with a fountain pen, computers have to work much harder. That is where optical character recognition (OCR) comes in. Here we process the image, where we apply various pre-processing techniques like desk wing, binarization etc. and algorithms like Tesseract to recognize the characters and give us the final document. T.Gnana Prakash | K. Anusha"Text Extraction from Image using Python" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-6 , October 2017, URL: http://www.ijtsrd.com/papers/ijtsrd2501.pdf http://www.ijtsrd.com/computer-science/simulation/2501/text-extraction-from-image-using-python/tgnana-prakash
The term Machine Learning was coined by Arthur Samuel in 1959, an American pioneer in the field of computer gaming and artificial intelligence, and stated that “it gives computers the ability to learn without being explicitly programmed”. Machine Learning is the latest buzzword floating around. It deserves to, as it is one of the most interesting subfields of Computer Science. So what does Machine Learning really mean? Let’s try to understand Machine Learning
Nowadays character recognition has gained lot of attention in the field of pattern recognition due to its application in various fields. It is one of the most successful applications of automatic pattern recognition. Research in OCR is popular for its application potential in banks, post offices, office automation etc. HCR is useful in cheque processing in banks; almost all kind of form processing systems, handwritten postal address resolution and many more. This paper presents a simple and efficient approach for the implementation of OCR and translation of scanned images of printed text into machine-encoded text. It makes use of different image analysis phases followed by image detection via pre-processing and post-processing. This paper also describes scanning the entire document (same as the segmentation in our case) and recognizing individual characters from image irrespective of their position, size and various font styles and it deals with recognition of the symbols from English language, which is internationally accepted.
presentation from my thesis defense on text summarization, discusses already existing state of art models along with efficiency of AMR or Abstract Meaning Representation for text summarization, we see how we can use AMRs with seq2seq models. We also discuss other techniques such as BPE or Byte Pair Encoding and its effectiveness for the task. Also we see how data augmentation with POS tags and AMRs effect the summarization with s2s learning.
Handwriting character recognition (HCR) is the ability of a computer to receive and interpret handwritten input. Handwritten Character Recognition is one of the active and challenging research areas in the field of Pattern Recognition. Pattern recognition is a process that taking in raw data and making an action based on the category of the pattern. HCR is one of the well-known applications of pattern recognition. Handwriting recognition especially for Indian languages is still in infant stage because not much work has been done it. This paper discuss about an idea to recognize Kannada vowels using chain code features. Kannada is a South Indian language. For any recognition system, an important part is feature extraction. A proper feature extraction method can increase the recognition ratio. In this paper, a chain code based feature extraction method is investigated for developing HCR system. Chain code is working based on 4-neighborhood or 8–neighborhood methods. Chain code is a sequence of code directions of a character and connection to a starting point which is often used in image processing. In this paper, 8–neighborhood method has been implemented which allows generation of eight different codes for each character. These codes have been used as features of the character image, which have been later on used for training and testing for K-Nearest Neighbor (KNN) classifiers. The level of accuracy reached to 100%.
Study on a Hybrid Segmentation Approach for Handwritten Numeral Strings in Fo...inventionjournals
This paper presents a hybrid approach to segment single- or multiple-touching handwritten numeral strings in form document, the core of which is the combined use of foreground, background and recognition analysis. The algorithm first located some feature points on both the foreground and background skeleton images containing connected numeral strings in form document. Possible segmentation paths were then constructed by matching these feature points, with an unexpected benefit of removing useless strokes. Subsequently, all these segmentation paths were validated and ranked by a recognition-based analysis, where a well-trained two-stage classifier was applied to each separated digit image to obtain its reliability. Finally, by introducing a locally optimal strategy to accelerate the recognition process, the top ranked segmentation path survived to help make a decision on whether to accept or not. Experimental results show that the proposed method can achieve a correct segmentation rate of 96.2 percent on a large dataset collected by our own.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Offline Signiture and Numeral Recognition in Context of ChequeIJERA Editor
Signature is considered as one of the biometrics. Signature Verification System is required in almost all places where it is compulsory to authenticate a person or his/her credentials to proceed further transaction especially when it comes to bank cheques. For this purpose signature verification system must be powerful and accurate. Till date various methods have been used to make signature verification system powerful and accurate. Research here is related to offline signature verification. Shape Contexts have been used to verify whether 2 shapes are similar or not. It has been used for various applications such as digit recognition, 3D Object recognition, trademark retrieval etc. In this paper we present a modified version of shape context for signature verification on bank cheques using K-Nearest Neighbor classifier.
Design and implementation of optical character recognition using template mat...eSAT Journals
Abstract
Optical character recognition (OCR) is an efficient way of converting scanned image into machine code which can further edit. There are variety of methods have been implemented in the field of character recognition. This paper proposes Optical character recognition by using Template Matching. The templates formed, having variety of fonts and size .In this proposed system, Image pre-processing, Feature extraction and classification algorithms have been implemented so as to build an excellent character recognition technique for different scripts .Result of this approach is also discussed in this paper. This system is implemented in Matlab.
Keywords- OCR, Feature Extraction, Classification
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...acijjournal
Handwritten character recognition is conversion of handwritten text to machine readable and editable form. Online character recognition deals with live conversion of characters. Malayalam is a language spoken by millions of people in the state of Kerala and the union territories of Lakshadweep and Pondicherry in India. It is written mostly in clockwise direction and consists of loops and curves. The method aims at training a simple neural network with three layers using backpropagation algorithm.
Freeman codes are used to represent each character as feature vector. These feature vectors act as inputs to the network during the training and testing phases of the neural network. The output is the character expressed in the Unicode format.
Segmentation and recognition of handwritten gurmukhi scriptRAJENDRA VERMA
To Segment handwritten cursive words into individual predefined strokes. I had design a algorithm which is calculated angle between two coordinate points in the basic of angle and segment the handwritten cursive word. It improve the accuracy of handwriting recognition system.
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...csandit
Optical Character recognition is the method of digitalization of hand and type written or
printed text into machine-encoded form and is superfluity of the various applications of envision
of human’s life. In present human life OCR has been successfully using in finance, legal,
banking, health care and home need appliances. India is a multi cultural, literature and
traditional scripted country. Telugu is the southern Indian language, it is a syllabic language,
symbol script represents a complete syllable and formed with the conjunct mixed consonants in
their representation. Recognition of mixed conjunct consonants is critical than the normal
consonants, because of their variation in written strokes, conjunct maxing with pre and post
level of consonants. This paper proposes the layered approach methodology to recognize the
characters, conjunct consonants, mixed- conjunct consonants and expressed the efficient
classification of the hand written and printed conjunct consonants. This paper implements the
Advanced Fuzzy Logic system controller to take the text in the form of written or printed,
collected the text images from the scanned file, digital camera, Processing the Image with
Examine the high intensity of images based on the quality ration, Extract the image characters
depends on the quality then check the character orientation and alignment then to check the
character thickness, base and print ration. The input image characters can classify into the two
ways, first way represents the normal consonants and the second way represents conjunct
consonants. Digitalized image text divided into three layers, the middle layer represents normal
consonants and the top and bottom layer represents mixed conjunct consonants. Here
recognition process starts from middle layer, and then it continues to check the top and bottom
layers. The recognition process treat as conjunct consonants when it can detect any symbolic
characters in top and bottom layers of present base character otherwise treats as normal
consonants. The post processing technique applied to all three layered characters. Post
processing of the image: concentrated on the image text readability and compatibility, if the
readability is not process then repeat the process again. In this recognition process includes
slant correction, thinning, normalization, segmentation, feature extraction and classification. In
the process of development of the algorithm the pre-processing, segmentation, character
recognition and post-processing modules were discussed. The main objectives to the
development of this paper are: To develop the classification, identification of deference
prototyping for written and printed consonants, conjunct consonants and symbols based on 3
layered approaches with different measurable area by using fuzzy logic and to determine
suitable features for handwritten character recognition.
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...cscpconf
Optical Character recognition is the method of digitalization of hand and type written or printed text into machine-encoded form and is superfluity of the various applications of envision of human’s life. In present human life OCR has been successfully using in finance, legal, banking, health care and home need appliances. India is a multi cultural, literature and traditional scripted country. Telugu is the southern Indian language, it is a syllabic language, symbol script represents a complete syllable and formed with the conjunct mixed consonants in their representation. Recognition of mixed conjunct consonants is critical than the normal
consonants, because of their variation in written strokes, conjunct maxing with pre and post level of consonants. This paper proposes the layered approach methodology to recognize the characters, conjunct consonants, mixed- conjunct consonants and expressed the efficient classification of the hand written and printed conjunct consonants. This paper implements the Advanced Fuzzy Logic system controller to take the text in the form of written or printed, collected the text images from the scanned file, digital camera, Processing the Image with Examine the high intensity of images based on the quality ration, Extract the image characters depends on the quality then check the character orientation and alignment then to check the character thickness, base and print ration. The input image characters can classify into the two ways, first way represents the normal consonants and the second way represents conjunct consonants. Digitalized image text divided into three layers, the middle layer represents normal consonants and the top and bottom layer represents mixed conjunct consonants. Here
recognition process starts from middle layer, and then it continues to check the top and bottom layers. The recognition process treat as conjunct consonants when it can detect any symbolic characters in top and bottom layers of present base character otherwise treats as normal consonants. The post processing technique applied to all three layered characters. Post processing of the image: concentrated on the image text readability and compatibility, if the
readability is not process then repeat the process again. In this recognition process includes slant correction, thinning, normalization, segmentation, feature extraction and classification. In the process of development of the algorithm the pre-processing, segmentation, character recognition and post processing modules were discussed. The main objectives to the development of this paper are: To develop the classification, identification of deference prototyping for written and printed consonants, conjunct consonants and symbols based on 3 layered approaches with different measurable area by using fuzzy logic and to determine suitable features for handwritten character recognition.
Hybrid fingerprint matching algorithm for high accuracy and reliabilityeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
OCR for Gujarati Numeral using Neural Networkijsrd.com
This papers functions within to reduce individuality popularity (OCR) program for hand-written Gujarati research. One can find so much of work for Indian own native different languages like Hindi, Gujarati, Tamil, Bengali, Malayalam, Gurumukhi etc., but Gujarati is a vocabulary for which hardly any work is traceable especially for hand-written individuals. Here in this work a nerve program is provided for Gujarati hand-written research popularity. This paper deals with an optical character recognition (OCR) system for handwritten Gujarati numbers. A several break up food ahead nerve program is suggested for variation of research. The functions of Gujarati research are abstracted by four different details of research. Reduction and skew- changes are also done for preprocessing of hand-written research before their variation. This work has purchased approximately 81% of performance for Gujarati handwritten numerals.
Two Methods for Recognition of Hand Written Farsi CharactersCSCJournals
Optical character recognition (OCR) is one of the active bases of sample detection topics. The current study focuses on automatic detection and recognition of hand written Farsi characters. For this purpose; we proposed two different methods based on neural networks and a special post processing approach to improve recognition rate of Farsi uppercase letters. In the first method, we extracted wavelet features from borders of character images and learned a neural network based these patterns. In the second method, we divided input characters into five groups according to the number of their components and used a set of appropriate moment features in each group and classified characters by the Bayesian rule. In a post-processing stage, some structural and statistical features were employed by a decision tree classifier to reduce the misrecognition rate. Our experimental results show suitable recognition rate for both methods.
OCR is abbreviated as Optical Character Recognition. Optical Character recognition is a process of recognition of different characters (printed or handwritten) from a digital image of documents. In OCR technique, characters can be recognized through optical mechanism. Various combinations of lines & curves make the characters. Characters recognition ability of human beings is very high. They can recognize all characters accurately. But same task is very difficult by OCR system. The wide usage of touch-screen based mobile devices has led to a large volume of the users preferring touch-based interaction with the machine, as opposed to traditional input via keyboards/mice. To exploit this, we focus on the Android platform to design a personalized handwriting recognition system that is acceptably fast, light-weight, possessing a user-friendly interface with minimally-intrusive correction and auto-personalization mechanisms.
Bangla Optical Digits Recognition using Edge Detection MethodIOSR Journals
Abstract:This paper is based on Bangla Optical Digit Recognition (ODR) by the Edge detection technique. In this method, Bangla digit image converted into gray-scale which distributed by an M by N array form. Here input data are considered off-line printed digit’s image which collected from computer generated image, scanned documents or printed text. After addressing the gray-scale image against a variable in the form of an M by N array, where the value of array pointers are shown 255 for total white space, 0 (zero) for total dark space and value between 255 and 0 for mix of white and dark space of the image. At the next process, four edgestouch points as well as each touch point’s ratio use as parameters to determine each Bangla digit uniquely. Keywords-Edge, image,gray-scale, Matrix,ODR.
Comparative study of two methods for Handwritten Devanagari Numeral RecognitionIOSR Journals
Abstract : In this paper two different methods for Numeral Recognition are proposed and their results are
compared. The objective of this paper is to provide an efficient and reliable method for recognition of
handwritten numerals. First method employs Grid based feature extraction and recognition algorithm. In this
method the features of the image are extracted by using grid technique and this feature set is then compared
with the feature set of database image for classification. While second method contains Image Centroid Zone
and Zone Centroid Zone algorithms for feature extraction and the features are applied to Artificial Neural
Network for recognition of input image. Machine text recognition is important research area because of its
applications in many areas like Bank, Post office, Hospitals etc.
Keywords: Handwritten Numeral Recognition, Grid Technique, ANN, Feature Extraction, Classification.
Segmentation and recognition of handwritten digit numeral string using a mult...ijfcstjournal
In this paper, the use of Multi-Layer Perceptron (MLP) Neural Network model is proposed for recognizing
unconstrained offline handwritten Numeral strings. The Numeral strings are segmented and isolated
numerals are obtained using a connected component labeling (CCL) algorithm approach. The structural
part of the models has been modeled using a Multilayer Perceptron Neural Network. This paper also
presents a new technique to remove slope and slant from handwritten numeral string and to normalize the
size of text images and classify with supervised learning methods. Experimental results on a database of
102 numeral string patterns written by 3 different people show that a recognition rate of 99.7% is obtained
on independent digits contained in the numeral string of digits includes both the skewed and slant data.
Behavior study of entropy in a digital image through an iterative algorithmijscmcj
Image segmentation is a critical step in computer vision tasks constituting an essential issue for pattern recognition and visual interpretation. In this paper, we study the behavior of entropy in digital images through an iterative algorithm of mean shift filtering. The order of a digital image in gray levels is defined. The behavior of Shannon entropy is analyzed and then compared, taking into account the number of iterations of our algorithm, with the maximum entropy that could be achieved under the same order. The use of equivalence classes it induced, which allow us to interpret entropy as a hyper-surface in real m dimensional space. The difference of the maximum entropy of order n and the entropy of the image is used to group the the iterations, in order to caractrizes the performance of the algorithm.
OCR-THE 3 LAYERED APPROACH FOR DECISION MAKING STATE AND IDENTIFICATION OF TE...ijaia
Optical Character recognition is the method of digitalization of hand and type written or printed text into
machine-encoded form and is superfluity of the various applications of envision of human’s life. In present
human life OCR has been successfully using in finance, legal, banking, health care and home need
appliances. India is a multi cultural, literature and traditional scripted country. Telugu is the southern
Indian language, it is a syllabic language, symbol script represents a complete syllable and formed with the
conjunct mixed consonants in their representation. Recognition of mixed conjunct consonants is critical
than the normal consonants, because of their variation in written strokes, conjunct maxing with pre and
post level of consonants. This paper proposes the layered approach methodology to recognize the
characters, conjunct consonants, mixed- conjunct consonants and expressed the efficient classification of
the hand written and printed conjunct consonants. This paper implements the Advanced Fuzzy Logic system
controller to take the text in the form of written or printed, collected the text images from the scanned file,
digital camera, Processing the Image with Examine the high intensity of images based on the quality
ration, Extract the image characters depends on the quality then check the character orientation and
alignment then to check the character thickness, base and print ration. The input image characters can
classify into the two ways, first way represents the normal consonants and the second way represents
conjunct consonants. Digitalized image text divided into three layers, the middle layer represents normal
consonants and the top and bottom layer represents mixed conjunct consonants. Here recognition process
starts from middle layer, and then it continues to check the top and bottom layers. The recognition process
treat as conjunct consonants when it can detect any symbolic characters in top and bottom layers of
present base character otherwise treats as normal consonants. The post processing technique applied to all
three layered characters. Post processing of the image: concentrated on the image text readability and
compatibility, if the readability is not process then repeat the process again. In this recognition process
includes slant correction, thinning, normalization, segmentation, feature extraction and classification. In
the process of development of the algorithm the pre-processing, segmentation, character recognition and
post-processing modules were discussed. The main objectives to the development of this paper are: To
develop the classification, identification of deference prototyping for written and printed consonants,
conjunct consonants and symbols based on 3 layered approaches with different measurable area by using
fuzzy logic and to determine suitable features for handwritten character recognition.
Similar to ONLINE BANGLA HANDWRITTEN COMPOUND WORD RECOGNITION BASED ON SEGMENTATION (20)
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR cscpconf
The progressive development of Synthetic Aperture Radar (SAR) systems diversify the exploitation of the generated images by these systems in different applications of geoscience. Detection and monitoring surface deformations, procreated by various phenomena had benefited from this evolution and had been realized by interferometry (InSAR) and differential interferometry (DInSAR) techniques. Nevertheless, spatial and temporal decorrelations of the interferometric couples used, limit strongly the precision of analysis results by these techniques. In this context, we propose, in this work, a methodological approach of surface deformation detection and analysis by differential interferograms to show the limits of this technique according to noise quality and level. The detectability model is generated from the deformation signatures, by simulating a linear fault merged to the images couples of ERS1 / ERS2 sensors acquired in a region of the Algerian south.
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATIONcscpconf
A novel based a trajectory-guided, concatenating approach for synthesizing high-quality image real sample renders video is proposed . The lips reading automated is seeking for modeled the closest real image sample sequence preserve in the library under the data video to the HMM predicted trajectory. The object trajectory is modeled obtained by projecting the face patterns into an KDA feature space is estimated. The approach for speaker's face identification by using synthesise the identity surface of a subject face from a small sample of patterns which sparsely each the view sphere. An KDA algorithm use to the Lip-reading image is discrimination, after that work consisted of in the low dimensional for the fundamental lip features vector is reduced by using the 2D-DCT.The mouth of the set area dimensionality is ordered by a normally reduction base on the PCA to obtain the Eigen lips approach, their proposed approach by[33]. The subjective performance results of the cost function under the automatic lips reading modeled , which wasn’t illustrate the superior performance of the
method.
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...cscpconf
Universities offer software engineering capstone course to simulate a real world-working environment in which students can work in a team for a fixed period to deliver a quality product. The objective of the paper is to report on our experience in moving from Waterfall process to Agile process in conducting the software engineering capstone project. We present the capstone course designs for both Waterfall driven and Agile driven methodologies that highlight the structure, deliverables and assessment plans.To evaluate the improvement, we conducted a survey for two different sections taught by two different instructors to evaluate students’ experience in moving from traditional Waterfall model to Agile like process. Twentyeight students filled the survey. The survey consisted of eight multiple-choice questions and an open-ended question to collect feedback from students. The survey results show that students were able to attain hands one experience, which simulate a real world-working environment. The results also show that the Agile approach helped students to have overall better design and avoid mistakes they have made in the initial design completed in of the first phase of the capstone project. In addition, they were able to decide on their team capabilities, training needs and thus learn the required technologies earlier which is reflected on the final product quality
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIEScscpconf
Using social media in education provides learners with an informal way for communication. Informal communication tends to remove barriers and hence promotes student engagement. This paper presents our experience in using three different social media technologies in teaching software project management course. We conducted different surveys at the end of every semester to evaluate students’ satisfaction and engagement. Results show that using social media enhances students’ engagement and satisfaction. However, familiarity with the tool is an important factor for student satisfaction.
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGICcscpconf
In real world computing environment with using a computer to answer questions has been a human dream since the beginning of the digital era, Question-answering systems are referred to as intelligent systems, that can be used to provide responses for the questions being asked by the user based on certain facts or rules stored in the knowledge base it can generate answers of questions asked in natural , and the first main idea of fuzzy logic was to working on the problem of computer understanding of natural language, so this survey paper provides an overview on what Question-Answering is and its system architecture and the possible relationship and
different with fuzzy logic, as well as the previous related research with respect to approaches that were followed. At the end, the survey provides an analytical discussion of the proposed QA models, along or combined with fuzzy logic and their main contributions and limitations.
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS cscpconf
Human beings generate different speech waveforms while speaking the same word at different times. Also, different human beings have different accents and generate significantly varying speech waveforms for the same word. There is a need to measure the distances between various words which facilitate preparation of pronunciation dictionaries. A new algorithm called Dynamic Phone Warping (DPW) is presented in this paper. It uses dynamic programming technique for global alignment and shortest distance measurements. The DPW algorithm can be used to enhance the pronunciation dictionaries of the well-known languages like English or to build pronunciation dictionaries to the less known sparse languages. The precision measurement experiments show 88.9% accuracy.
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
In education, the use of electronic (E) examination systems is not a novel idea, as Eexamination systems have been used to conduct objective assessments for the last few years. This research deals with randomly designed E-examinations and proposes an E-assessment system that can be used for subjective questions. This system assesses answers to subjective questions by finding a matching ratio for the keywords in instructor and student answers. The matching ratio is achieved based on semantic and document similarity. The assessment system is composed of four modules: preprocessing, keyword expansion, matching, and grading. A survey and case study were used in the research design to validate the proposed system. The examination assessment system will help instructors to save time, costs, and resources, while increasing efficiency and improving the productivity of exam setting and assessments.
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTICcscpconf
African Buffalo Optimization (ABO) is one of the most recent swarms intelligence based metaheuristics. ABO algorithm is inspired by the buffalo’s behavior and lifestyle. Unfortunately, the standard ABO algorithm is proposed only for continuous optimization problems. In this paper, the authors propose two discrete binary ABO algorithms to deal with binary optimization problems. In the first version (called SBABO) they use the sigmoid function and probability model to generate binary solutions. In the second version (called LBABO) they use some logical operator to operate the binary solutions. Computational results on two knapsack problems (KP and MKP) instances show the effectiveness of the proposed algorithm and their ability to achieve good and promising solutions.
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAINcscpconf
In recent years, many malware writers have relied on Dynamic Domain Name Services (DDNS) to maintain their Command and Control (C&C) network infrastructure to ensure a persistence presence on a compromised host. Amongst the various DDNS techniques, Domain Generation Algorithm (DGA) is often perceived as the most difficult to detect using traditional methods. This paper presents an approach for detecting DGA using frequency analysis of the character distribution and the weighted scores of the domain names. The approach’s feasibility is demonstrated using a range of legitimate domains and a number of malicious algorithmicallygenerated domain names. Findings from this study show that domain names made up of English characters “a-z” achieving a weighted score of < 45 are often associated with DGA. When a weighted score of < 45 is applied to the Alexa one million list of domain names, only 15% of the domain names were treated as non-human generated.
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...cscpconf
The amount of piracy in the streaming digital content in general and the music industry in specific is posing a real challenge to digital content owners. This paper presents a DRM solution to monetizing, tracking and controlling online streaming content cross platforms for IP enabled devices. The paper benefits from the current advances in Blockchain and cryptocurrencies. Specifically, the paper presents a Global Music Asset Assurance (GoMAA) digital currency and presents the iMediaStreams Blockchain to enable the secure dissemination and tracking of the streamed content. The proposed solution provides the data owner the ability to control the flow of information even after it has been released by creating a secure, selfinstalled, cross platform reader located on the digital content file header. The proposed system provides the content owners’ options to manage their digital information (audio, video, speech, etc.), including the tracking of the most consumed segments, once it is release. The system benefits from token distribution between the content owner (Music Bands), the content distributer (Online Radio Stations) and the content consumer(Fans) on the system blockchain.
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMcscpconf
This paper discusses the importance of verb suffix mapping in Discourse translation system. In
discourse translation, the crucial step is Anaphora resolution and generation. In Anaphora
resolution, cohesion links like pronouns are identified between portions of text. These binders
make the text cohesive by referring to nouns appearing in the previous sentences or nouns
appearing in sentences after them. In Machine Translation systems, to convert the source
language sentences into meaningful target language sentences the verb suffixes should be
changed as per the cohesion links identified. This step of translation process is emphasized in
the present paper. Specifically, the discussion is on how the verbs change according to the
subjects and anaphors. To explain the concept, English is used as the source language (SL) and
an Indian language Telugu is used as Target language (TL)
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...cscpconf
In this paper, based on the definition of conformable fractional derivative, the functional
variable method (FVM) is proposed to seek the exact traveling wave solutions of two higherdimensional
space-time fractional KdV-type equations in mathematical physics, namely the
(3+1)-dimensional space–time fractional Zakharov-Kuznetsov (ZK) equation and the (2+1)-
dimensional space–time fractional Generalized Zakharov-Kuznetsov-Benjamin-Bona-Mahony
(GZK-BBM) equation. Some new solutions are procured and depicted. These solutions, which
contain kink-shaped, singular kink, bell-shaped soliton, singular soliton and periodic wave
solutions, have many potential applications in mathematical physics and engineering. The
simplicity and reliability of the proposed method is verified.
AUTOMATED PENETRATION TESTING: AN OVERVIEWcscpconf
The using of information technology resources is rapidly increasing in organizations,
businesses, and even governments, that led to arise various attacks, and vulnerabilities in the
field. All resources make it a must to do frequently a penetration test (PT) for the environment
and see what can the attacker gain and what is the current environment's vulnerabilities. This
paper reviews some of the automated penetration testing techniques and presents its
enhancement over the traditional manual approaches. To the best of our knowledge, it is the
first research that takes into consideration the concept of penetration testing and the standards
in the area.This research tackles the comparison between the manual and automated
penetration testing, the main tools used in penetration testing. Additionally, compares between
some methodologies used to build an automated penetration testing platform.
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORKcscpconf
Since the mid of 1990s, functional connectivity study using fMRI (fcMRI) has drawn increasing
attention of neuroscientists and computer scientists, since it opens a new window to explore
functional network of human brain with relatively high resolution. BOLD technique provides
almost accurate state of brain. Past researches prove that neuro diseases damage the brain
network interaction, protein- protein interaction and gene-gene interaction. A number of
neurological research paper also analyse the relationship among damaged part. By
computational method especially machine learning technique we can show such classifications.
In this paper we used OASIS fMRI dataset affected with Alzheimer’s disease and normal
patient’s dataset. After proper processing the fMRI data we use the processed data to form
classifier models using SVM (Support Vector Machine), KNN (K- nearest neighbour) & Naïve
Bayes. We also compare the accuracy of our proposed method with existing methods. In future,
we will other combinations of methods for better accuracy.
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...cscpconf
In order to treat and analyze real datasets, fuzzy association rules have been proposed. Several
algorithms have been introduced to extract these rules. However, these algorithms suffer from
the problems of utility, redundancy and large number of extracted fuzzy association rules. The
expert will then be confronted with this huge amount of fuzzy association rules. The task of
validation becomes fastidious. In order to solve these problems, we propose a new validation
method. Our method is based on three steps. (i) We extract a generic base of non redundant
fuzzy association rules by applying EFAR-PN algorithm based on fuzzy formal concept analysis.
(ii) we categorize extracted rules into groups and (iii) we evaluate the relevance of these rules
using structural equation model.
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAcscpconf
In many applications of data mining, class imbalance is noticed when examples in one class are
overrepresented. Traditional classifiers result in poor accuracy of the minority class due to the
class imbalance. Further, the presence of within class imbalance where classes are composed of
multiple sub-concepts with different number of examples also affect the performance of
classifier. In this paper, we propose an oversampling technique that handles between class and
within class imbalance simultaneously and also takes into consideration the generalization
ability in data space. The proposed method is based on two steps- performing Model Based
Clustering with respect to classes to identify the sub-concepts; and then computing the
separating hyperplane based on equal posterior probability between the classes. The proposed
method is tested on 10 publicly available data sets and the result shows that the proposed
method is statistically superior to other existing oversampling methods.
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCHcscpconf
Data collection is an essential, but manpower intensive procedure in ecological research. An
algorithm was developed by the author which incorporated two important computer vision
techniques to automate data cataloging for butterfly measurements. Optical Character
Recognition is used for character recognition and Contour Detection is used for imageprocessing.
Proper pre-processing is first done on the images to improve accuracy. Although
there are limitations to Tesseract’s detection of certain fonts, overall, it can successfully identify
words of basic fonts. Contour detection is an advanced technique that can be utilized to
measure an image. Shapes and mathematical calculations are crucial in determining the precise
location of the points on which to draw the body and forewing lines of the butterfly. Overall,
92% accuracy were achieved by the program for the set of butterflies measured.
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...cscpconf
Smart cities utilize Internet of Things (IoT) devices and sensors to enhance the quality of the city
services including energy, transportation, health, and much more. They generate massive
volumes of structured and unstructured data on a daily basis. Also, social networks, such as
Twitter, Facebook, and Google+, are becoming a new source of real-time information in smart
cities. Social network users are acting as social sensors. These datasets so large and complex
are difficult to manage with conventional data management tools and methods. To become
valuable, this massive amount of data, known as 'big data,' needs to be processed and
comprehended to hold the promise of supporting a broad range of urban and smart cities
functions, including among others transportation, water, and energy consumption, pollution
surveillance, and smart city governance. In this work, we investigate how social media analytics
help to analyze smart city data collected from various social media sources, such as Twitter and
Facebook, to detect various events taking place in a smart city and identify the importance of
events and concerns of citizens regarding some events. A case scenario analyses the opinions of
users concerning the traffic in three largest cities in the UAE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGEcscpconf
The anonymity of social networks makes it attractive for hate speech to mask their criminal
activities online posing a challenge to the world and in particular Ethiopia. With this everincreasing
volume of social media data, hate speech identification becomes a challenge in
aggravating conflict between citizens of nations. The high rate of production, has become
difficult to collect, store and analyze such big data using traditional detection methods. This
paper proposed the application of apache spark in hate speech detection to reduce the
challenges. Authors developed an apache spark based model to classify Amharic Facebook
posts and comments into hate and not hate. Authors employed Random forest and Naïve Bayes
for learning and Word2Vec and TF-IDF for feature selection. Tested by 10-fold crossvalidation,
the model based on word2vec embedding performed best with 79.83%accuracy. The
proposed method achieve a promising result with unique feature of spark for big data.
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTcscpconf
This article presents Part of Speech tagging for Nepali text using General Regression Neural
Network (GRNN). The corpus is divided into two parts viz. training and testing. The network is
trained and validated on both training and testing data. It is observed that 96.13% words are
correctly being tagged on training set whereas 74.38% words are tagged correctly on testing
data set using GRNN. The result is compared with the traditional Viterbi algorithm based on
Hidden Markov Model. Viterbi algorithm yields 97.2% and 40% classification accuracies on
training and testing data sets respectively. GRNN based POS Tagger is more consistent than the
traditional Viterbi decoding technique.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
2. 70 Computer Science & Information Technology (CS & IT)
input process available such as: current position, movement’s direction, stopping points, starting
points, strokes order.
- Offline recognition system: the system accepts image as input from scanner, offline recognition
is more difficult than online recognition: because of not availability of contextual information and
prior knowledge like text position, size of text, order of strokes, stop points, and start points.
Furthermore there are noises in image while the noises in online recognition near to be absent.
2. DATA COLLECTION
On-line handwriting recognition involves the automatic conversion of text as it is written on a
special digitizer or A4 take note where a sensor picks up the pen-tip movements X (t), Y (t) as
well as pen-up/pen-down P either with 0 or 1 switching. That kind of data is known as digital ink
and can be regarded as a dynamic representation of handwriting. The ink signal is captured by
either:
A paper based capture device,
A digital pen on patterned paper,
A pen-sensitive touch screen,
To collect the data (Word) I used A4 take note or the datasheets. Here we used datasheets.
For online data collection, the sampling rate of the signal is considered fixed for all the samples of
all the classes of character. Thus the number of points M in the series of co-ordinates samples of
all the classes of character. Thus the number of points M in the series of co-ordinates for a
particular sample is not fixed and depends on the time taken to write the sample on the pad. As
the number of points in actual trace of the characters are generally large and varies greatly due to
high variation in writing speed, a fixed lesser number of points, regularly spaced in time are
selected for further processing. The digitizer output is represented in the format of pi € R 2
X{0,1}; i = 1:M, where pi is the pen position having x-coordinate and y-coordinate and M is the
total number of sample points. Let (pi) and (pj) be two consecutive pen points. We retain both of
these two consecutive pen points (pi) and (pj) if the following condition is satisfied:
x2 + y2 > m2 ………. (i)
where x = xi - xj and y = yi - ¡yj. The parameter m is empirically chosen. I have set m equal to
zero in Equation (i) to removes all consecutive repeated points.
Analyzing a total of 4200 Bangla compound words we found that, for writing Bangla characters,
the number of sample points (M) varies from 14 (for the character ) to 176 (for the character k)
points. The average number of sample points in a Bangla character is 72. I also computed the
average number of sample points in each character class. I noted that the character class ( ) has
the maximum number of sample points and its average value is 113. The character class ( ) has
the minimum number (46) of sample points. Figure 1 shows the online collected data in form of
text and the datasheet of 42 compound words.
3. Computer Science & Information Technology (CS & IT) 71
Figure 1: Datasheet for Collection Data and Text format of collected Data
3. STROKE EXTRACTION
By stroke we mean the set of points obtained between a pen down and pen up. In other words the
number of sample points collected by a continuous writing of the pen without lifting it. Main
difficulty of Bangla character recognition is shape similarity, stroke size and the order variation of
different strokes. From the statistical analysis on our dataset we found that the minimum and
maximum number of stroke used to write a Bangla compound character is 1 and 6. Bangla
compound characters also may be written by using all of these basic strokes. So in Bangla
language apart from the simple 66 strokes with compound characters there are mainly 72 strokes
available. All of these strokes also written by the combination of basic 66 strokes. Although in
case of combination we consider that 66 + 72 basic stroke in Bangla, so a total of 138 basic
strokes. The list of compound basic strokes is in Figure 2:
Figure 2: Compound basic Stroke
4. COMPOUND WORD SEGMENTATION
There are about 280 compound characters in Bangla. Main difficulty of Bangla character
recognition is shape similarity, stroke size and the order variation of different strokes. I know that
in Bengali handwriting the movement of each stroke is generally downside. By keeping this
4. 72 Computer Science & Information Technology (CS & IT)
concept in mind it has been seen that in a downside movement stroke the point from where that
downside movement starts [10, 11] at that point I have to split that stroke. This should be done
only in the upper zone i.e. first 33% portion of the total height of the image. In the remaining 67%
of the image segmentation is not needed. But the compound characters mostly prepared by using
the two different simple characters. By considering that feature of compound characters, these
characters also may be segmented from middle portion also (i.e. 50% of total height). People
write any word in Bangla, such a manner where more than one alphabet is joined with one
another. This joining is generally found in the upper 1/3rd. portion of the character (exception in
few cases) [11]. The modified segmentation algorithm is as follows:
Step 1: Store each pixel of the online data in three variables corresponds to X and Y
coordinates and pen feature value of 0 or 1 in third variables for identifying strokes.
Step 2: For each third variable value 0 separates each strokes scanning pixels of the word.
Calculate 30% of the height for a simple and 50% of the height for a compound character.
Step 3: Select at which point of stroke segmentation is needed based on the previous output.
We have to finally segment those points of same or different strokes which required to be
segmented. So, we use one function to check at which pixel it is feasible to segment a stroke. We
have to check few features of Bangla characters for this process such as:
i) Each pixel’s distance from the start and end of the stroke,
ii) The width of the stroke up to the pixel in question from the start and end of the stroke,
iii) The height of the stroke up to the pixel in question,
iv) Total stroke distance,
v) Total width of the word. After finding these features we have to take some ratio of
(a) Each pixel’s distance & Total stroke distance,
(b) The width of the stroke up to the pixel in question & Total width of the word and thus to
decide at which pixel of a particular stroke segmentation is feasible.
Step 4: Now if at a particular pixel it is feasible to segment the stroke, then first we check
whether that pixel’s y co-ordinate value is 30% of the height or not. If it is not then there will be
no segmentation. If it is, then we check whether at that pixel downside movement of the stroke
starts or not. For this checking I am taking two points pi-1
and pi-2
before the point in question and
similarly two points pi+1
and pi+2
after that point. If the y-coordinate of pi-1
is <= p i-2
and pi
<= pi-1
and simultaneously if the y-coordinate of pi+1
>= pi
and p i+2
>= pi+1
(i.e. downside movement of
stroke) then only at pi
stroke is splitted. If at a particular point stroke is splitted then I skip next 9
or 10 pixels for checking of feasibility of segmentation.
Step 5: Repeat step 3 and 4 for each pixels and each strokes of the entire word.
By this approach I tried to segment all the compound words covering all the vowels and
consonants modifiers and also covering all the alphabets in Bangla language and the result is in
Figure 3.
5. Computer Science & Information Technology (CS & IT) 73
Figure 3: Result after Segmentation
5. FEATURE GENERATION
Any online feature is very much sensitive to writing stroke sequence and size variation. Total 233
features (90+15+128) are used [9].
The features used are:
Point based feature(90),
Structural features (15),
Directional feature (128),
The processed character is transformed into a sequence t = [t
1
… t
N
t
N+1 ….
t
N+15
t
N+15+1
…..t
N+15+128
]
of feature vectors t
i
= (t
i1,
t
i2,
t
i3
)
T
(Where I <= N).
6. TRAINING TO THE CLASSIFIER
In this step the extracted features are to be fed to the classifier by the training to the classifier
using the concept of Neural Network. Based on the above-normalized features, a Multilayer
Perception Neural Network based scheme was used for recognition of the strokes. The Multi
Layer Perception Network (MLP) is, in general, a layered feed-forward network, pictorially
represented with a directed acyclic graph. Each node in the graph stands for an artificial neuron of
the MLP, and the labels in each directed arc denote the strength of synaptic connection between
two neurons and the direction of the signal flow in the MLP [8]. For pattern classification, the
number of neurons in the input layer of an MLP is determined by the number of features selected
for representing the relevant patterns in the feature space and output layer by the number of
classes in which the input data belongs. The neurons in hidden and output layers compute the
sigmoid function on the sum of the products of input values and weight values of the
corresponding connections to each neuron.
In this work the number of neurons in input and output layers of the perception is set to 278 and
138; respectively since the number features are 278 and the number of possible classes in hand
written stroke considered for the present case is 138. The number of hidden units was set to 90,
back propagation learning rate and acceleration factor is set to suitable values, based on trial runs.
6. 74 Computer Science & Information Technology (CS & IT)
7. WORD RECOGNITION
Each word will be constructed with the help of its recognized strokes. I have taken 42 different
Bengali words covering all the vowels & consonants in Bengali and all the modifiers using which
a Bangla word is written. Those words are collected online using A4 take note. Now to construct
each of those 42 words we have to send to the recognition module the correct combination of
basic strokes (which are obtained from segmentation) to recognize each character as each
character will be constructed with the help of its recognized strokes. So, if we can recognize each
and every character in a word as well as modifiers (if exists) individually, then the entire word
will be recognized. To do so, all the probable sequences of strokes are stored in a tree data
structure that makes a valid character into a database. To build this a database report is generated
from the raw data (characters), from which the sequences of strokes of the characters are gotten,
that are generally used by people.
8. RESULT AND DISCUSSION
The experimental evaluation of the above techniques was carried out on isolated Bangla words.
The data was collected from people of different background. A total of 42 different words are
collected for the experiment covering all vowels & consonants as well as modifiers of bangle
script. Each word’s 100 instances are taken from people of different background. So, a total of
4200 words have been collected still now and worked.
Data
Segmentation
Rate
Segmentation
Error
Compound
Word
Approx 87% Approx 13%
Table 1: Segmentation Result
Data
Under & Over
Segmentation
Error
Compound
Word Approx 9% 4%
Table 2: Under and Over Segmentation Result
From the experiment it was found that the overall accuracy of the proposed scheme considering
segmentation is satisfactory. I have checked that accuracy of segmentation is around 87% by
applying the modified segmentation algorithm shown in Table 1. Over and under segmentation
problem is reduced by using the new segmentation algorithm shown in Table 2.
I have checked that the classification accuracy of compound word is near around 73% by
applying my modified segmentation algorithm. Rejection rate of data is 24%. The system is
unable to classify the data nearly about 3%. Classification result is in Table 3.
Total Data Classified Data Rejected Data Misclassified Data
4200 3066 1008 126
Table 3: Classification Result
7. Computer Science & Information Technology (CS & IT) 75
Figure 4 and Figure 5 shows some under and over segmented data result for some compound
words which is unable to segment properly by the algorithm.
Figure 4: Over Segmented Figure 5: Under Segmented
Table 4 shows the under and over segmented characters suffers from the segmentation algorithm.
Over Segmented Under Segmented
Character Word Character Word
a t t
h bh t t
Table 4: Over and Under Segmented Characters
9. CONCLUSION
This work describes a new approach of Bangla Compound Handwritten Word Recognition. The
work is done on 42 different predetermined Bangla compound words based on different vowels
with a modified segmentation algorithm specially for Bangla Compound Word from various
writing style which made the system more dynamic with a higher recognition rate than the
previous. The system now can work with any Bangla basic and compound words; so it is dynamic
in nature. The problem of over and under segmentation error decreased but still it suffers from
few characters (such as ‘ ’, ‘h’, ‘ ’) because of their complex physical structure.
Not much work has been done towards the online compound word recognition of Indian scripts in
general and Bangla in particular. I think this work can be helpful for Bangla signature verification
or Paragraph Verification. So, there are many scope remains in this field based on this proposed
work.
ACKNOWLEDGEMENTS
I like to thankfully acknowledge my god, colleagues and family members for their all-round of
support towards the present work.
8. 76 Computer Science & Information Technology (CS & IT)
REFERENCES
[1] C. C. Tappert. C. Y. Suen, and T. Wakahara. "The State of the Art in On-Line Handwriting
Recognition," IEEE PAMI. Vol. 12, No. 8, pp. 179-190, 1990.
[2] E. J. Bellagarda, J. R. Bellagarda, D. Nahamoo and N. S. Nathan, "A probabilistic Framework
for Online Handwriting Recognition," 3rd IWFHR, pp. 225-234, 1993.
[3] B.B. Chaudhuri and U. Pal. "An OCR System to Read Two l" Indian Language Scripts: Bangla and
Devnagari." Proc. 4th ICDAR, pp. 1011-1015, 1997.
[4] I. Guyon, M. Schenkel, and J. Denker, "Overview and Synthesis of On-Line Cursive Handwriting
Recognition Techniques", Handbook of Character Recognition and Document Image Analysis, pp.
183-225, 1997.
[5] B.B. Chaudhuri and U. Pal, "A Complete Printed Bangla System," PR, Vol. 31, pp. 531-549, 1998.
[6] R. Plamondon and S. N. Srihari, "On-Line and Off-Line Handwriting Recognition: A Comprehensive
Survey". IEEE PAMI, vol. 22, No. 1, pp. 63-84, 2000.
[7] C. Bahlmann and H. Burkhardt, “The Writer Independent Online Handwriting Recognition System
frog on hand and Cluster Generative Statistical Dynamic Time Warping”, IEEE PAMI, Vol. 26, No.
3, pp. 1-12, 2004.
[8] K. Roy, C. Chaudhuri, M. Kundu, M. Nasipuri and D. K. Basu, "Comparison of the Multilayer
perceptron and the nearest neighbor classifier for handwritten digit Recognition", JISE, 21, 1245
(2005).
[9] K. Roy, N. Sharma, T. Pal, U. Pal “Online Bangla Handwritten Recognition System”- October 13,
2006, WSPC.
[10] Rajib Ghosh, Debnath Bhattacharyya, Samir Kumar Bandyopadhyay, “Segmentation of Online
Bangla Handwritten Word”, 2009 IEEE International Advance Computing Conference (IACC 2009),
Patiala, India.
[11] Sumanta Daw and Rajib Ghosh, “Online Bangla Handwritten Compound Word Recognition”,
International Conference for Computing and System – 2010, November 19-20, 2010, Burdwan
University, pp. 221-226.
AUTHOR
Mr. Sumanta Daw is an Assistant Professor in the Department of CSE at Hooghly
Engineering & Technology College in Hooghly. He was born on 20th
April, 1979. He
obtains his M.Sc. in Software Engineering in the year of 2003 and MTech in CSE in the
year of 2010. He worked in industry for one year and as an academician for last 9 years.
He has already published several national and international papers, including one
journal paper in the fields of Pattern Recognition and Network Security.