Named entity recognition (NER) is one of the applications of Natural Language Processing and is regarded as the subtask of information retrieval. NER is the process to detect Named Entities (NEs) in a document and to categorize them into certain Named entity classes such as the name of organization, person, location, sport, river, city, country, quantity etc. In English, we have accomplished lot of work related to NER. But, at present, still we have not been able to achieve much of the success pertaining to NER in the Indian languages. The following paper discusses about NER, the various approaches of NER, Performance Metrics, the challenges in NER in the Indian languages and finally some of the results that have been achieved by performing NER in Hindi by aggregating approaches such as Rule based heuristics and Hidden Markov Model (HMM).
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET Journal
This document summarizes research on named entity recognition for the Hindi language. It discusses various techniques that have been used for NER in Hindi, including rule-based approaches, machine learning approaches like hidden Markov models and conditional random fields, and hybrid approaches. The document also reviews several papers on Hindi NER systems that have used techniques like rule-based methods, list lookup, hidden Markov models, joint parsing with POS tagging, and distant supervision. Most studies found that machine learning approaches like hidden Markov models and conditional random fields produced relatively high accuracy, though performance could likely be improved further with larger annotated corpora.
This document summarizes a proposed framework for sentiment classification using fuzzy logic. The framework aims to detect both implicit and explicit sentiment expressions in text by incorporating multiple datasets and techniques. It involves preprocessing text data, classifying sentiment, applying fuzzy logic to reduce emotions, and using fuzzy c-means clustering to further group similar emotions. The goal is to more accurately extract sentiment from transcripts by identifying both implicit and explicit expressions as well as topics through this combined approach. Evaluation metrics like precision, recall and f-measure will be used to assess performance.
A study on the approaches of developing a named entity recognition tooleSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
DBMS Campus crack Question Prepared by Randhir KumarRandhir Chouhan
This document contains a summary of database concepts prepared by Mr. Randhir Kumar. It defines key terms like database, DBMS, database system, data models, ER model, relational model, and normal forms like 1NF, 2NF and 3NF. It also covers transaction management concepts like atomicity and durability, and database architecture topics such as query optimization, indexing, and the system catalog.
This paper presents a new multi-tier holistic approach for recognizing Urdu text written in Nastaliq script. It first identifies special ligatures like dots, tay, hamza and mad from base ligatures. It then associates the special ligatures with neighboring base ligatures. Features are extracted from the ligatures and special ligature-base ligature associations. These features are input to a neural network that recognizes the ligatures in three steps: 1) identifying special ligatures, 2) associating them with base ligatures, and 3) recognizing the base ligatures. The system was tested on 200 ligatures with 100% accuracy for ligatures in its training set and closest match classification for new ligatures.
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Networkkevig
In recent years, there has been an increasing use of social media among people in Myanmar and writing
review on social media pages about the product, movie, and trip are also popular among people. Moreover,
most of the people are going to find the review pages about the product they want to buy before deciding
whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very
important and time consuming for people. Sentiment analysis is one of the important processes for extracting
useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is
proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The
paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar
Language.
Author Credits - Maaz Anwar Nomani
Semantic Role Labeler (SRL) is a semantic parser which can automatically identify and then classify arguments of a verb in a natural language sentence for Hindi and Urdu. For e.g. in the natural language sentence “Sara won the competition because of her hard work.”, ‘won’ is the main verb and there are 3 arguments for this verb; ‘Sara’ (Agent), ‘hard work’ (Reason) and ‘competition’ (Theme). The problem statement of a SRL revolves around the fact that how will you make a machine identify and then classify the arguments of a verb in a natural language sentence.
Since there are 2 sub problem statements here (Identification and Classification), our SRL has a pipeline architecture in which a binary classifier (Logistic Regression) is first trained to identify whether a word is an argument to a verb in a sentence or not (Yes or No) and subsequently a multi-class classifier (SVM with Linear kernel) is trained to classify the identified arguments by above binary classifier into one of the 20 classes. These 20 classes are the various notions present in a natural language sentence (for e.g. Agent, Theme, Location, Time, Purpose, Reason, Cause etc.). These ‘notions’ are called Propbank labels or semantic labels present in a Proposition Bank which is a collection of hand-annotated sentences.
In essence, SRL felicitates Semantic Parsing which essentially is the research investigation of identifying WHO did WHAT to WHOM, WHERE, HOW, WHY and WHEN etc. in a natural language sentence.
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET Journal
This document summarizes research on named entity recognition for the Hindi language. It discusses various techniques that have been used for NER in Hindi, including rule-based approaches, machine learning approaches like hidden Markov models and conditional random fields, and hybrid approaches. The document also reviews several papers on Hindi NER systems that have used techniques like rule-based methods, list lookup, hidden Markov models, joint parsing with POS tagging, and distant supervision. Most studies found that machine learning approaches like hidden Markov models and conditional random fields produced relatively high accuracy, though performance could likely be improved further with larger annotated corpora.
This document summarizes a proposed framework for sentiment classification using fuzzy logic. The framework aims to detect both implicit and explicit sentiment expressions in text by incorporating multiple datasets and techniques. It involves preprocessing text data, classifying sentiment, applying fuzzy logic to reduce emotions, and using fuzzy c-means clustering to further group similar emotions. The goal is to more accurately extract sentiment from transcripts by identifying both implicit and explicit expressions as well as topics through this combined approach. Evaluation metrics like precision, recall and f-measure will be used to assess performance.
A study on the approaches of developing a named entity recognition tooleSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
DBMS Campus crack Question Prepared by Randhir KumarRandhir Chouhan
This document contains a summary of database concepts prepared by Mr. Randhir Kumar. It defines key terms like database, DBMS, database system, data models, ER model, relational model, and normal forms like 1NF, 2NF and 3NF. It also covers transaction management concepts like atomicity and durability, and database architecture topics such as query optimization, indexing, and the system catalog.
This paper presents a new multi-tier holistic approach for recognizing Urdu text written in Nastaliq script. It first identifies special ligatures like dots, tay, hamza and mad from base ligatures. It then associates the special ligatures with neighboring base ligatures. Features are extracted from the ligatures and special ligature-base ligature associations. These features are input to a neural network that recognizes the ligatures in three steps: 1) identifying special ligatures, 2) associating them with base ligatures, and 3) recognizing the base ligatures. The system was tested on 200 ligatures with 100% accuracy for ligatures in its training set and closest match classification for new ligatures.
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Networkkevig
In recent years, there has been an increasing use of social media among people in Myanmar and writing
review on social media pages about the product, movie, and trip are also popular among people. Moreover,
most of the people are going to find the review pages about the product they want to buy before deciding
whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very
important and time consuming for people. Sentiment analysis is one of the important processes for extracting
useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is
proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The
paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar
Language.
Author Credits - Maaz Anwar Nomani
Semantic Role Labeler (SRL) is a semantic parser which can automatically identify and then classify arguments of a verb in a natural language sentence for Hindi and Urdu. For e.g. in the natural language sentence “Sara won the competition because of her hard work.”, ‘won’ is the main verb and there are 3 arguments for this verb; ‘Sara’ (Agent), ‘hard work’ (Reason) and ‘competition’ (Theme). The problem statement of a SRL revolves around the fact that how will you make a machine identify and then classify the arguments of a verb in a natural language sentence.
Since there are 2 sub problem statements here (Identification and Classification), our SRL has a pipeline architecture in which a binary classifier (Logistic Regression) is first trained to identify whether a word is an argument to a verb in a sentence or not (Yes or No) and subsequently a multi-class classifier (SVM with Linear kernel) is trained to classify the identified arguments by above binary classifier into one of the 20 classes. These 20 classes are the various notions present in a natural language sentence (for e.g. Agent, Theme, Location, Time, Purpose, Reason, Cause etc.). These ‘notions’ are called Propbank labels or semantic labels present in a Proposition Bank which is a collection of hand-annotated sentences.
In essence, SRL felicitates Semantic Parsing which essentially is the research investigation of identifying WHO did WHAT to WHOM, WHERE, HOW, WHY and WHEN etc. in a natural language sentence.
Chunking means splitting the sentences into tokens and then grouping them in a meaningful way. When it comes to high-performance chunking systems, transformer models have proved to be the state of the art benchmarks. To perform chunking as a task it requires a large-scale high quality annotated corpus where each token is attached with a particular tag similar as that of Named Entity Recognition Tasks. Later these tags are used in conjunction with pointer frameworks to find the final chunk. To solve this for a specific domain problem, it becomes a highly costly affair in terms of time and resources to manually annotate and produce a large-high-quality training set. When the domain is specific and diverse, then cold starting becomes even more difficult because of the expected large number of manually annotated queries to cover all aspects. To overcome the problem, we applied a grammar-based text generation mechanism where instead of annotating a sentence we annotate using grammar templates. We defined various templates corresponding to different grammar rules. To create a sentence we used these templates along with the rules where symbol or terminal values were chosen from the domain data catalog. It helped us to create a large number of annotated queries. These annotated queries were used for training the machine learning model using an ensemble transformer-based deep neural network model [24.] We found that grammar-based annotation was useful to solve domain-based chunks in input query sentences without any manual annotation where it was found to achieve a classification F1 score of 96.97% in classifying the tokens for the out of template queries.
This document discusses fuzzy querying of relational databases. It begins by introducing fuzzy relational database management systems (FRDBMS), which allow imprecise queries using fuzzy logic. It then presents the basic concepts of fuzzy logic and membership functions. The architecture of an FRDBMS is described, including how it translates fuzzy queries into equivalent SQL queries. An example student database is used to demonstrate a fuzzy query for "poor performers" and how it returns more graded results than an exact SQL query. The document concludes that FRDBMS improves the expressiveness of queries over traditional databases.
Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compoundsidescitation
This paper presents a supervised machine learning
approach that uses a decision tree learning algorithm for
recognition of Bengali noun-noun compounds as multiword
expression (M WE) from Bengali corpus. Our proposed
approach to MWE recognition has two steps: (1) extraction of
candidate multi-word expressions using chunk information
and various heuristic rules and (2) training the machine
learning algorithm to recognize a candidate multi-word
expression as Multi-word expression or not. A variety of
association measures have been used as features for
identifying MWEs. The proposed system is tested on a Bengali
corpus for identifying noun-noun compound MWEs from the
corpus.
Semantic based automatic question generation using artificial immune systemAlexander Decker
The document describes a system that uses artificial immune systems and natural language processing techniques like semantic role labeling and named entity recognition to automatically generate questions from text. It introduces a model that applies these techniques to extract semantic patterns from sentences, trains a classifier using artificial immune systems to classify question types, and then generates questions by matching patterns. The system was tested on sentences from various sources and showed promising results, correctly determining question types 95% of the time and generating matching questions 87% of the time.
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITIONkevig
The aim of Named Entity Recognition (NER) is to identify references of named entities in unstructured documents, and to classify them into pre-defined semantic categories. NER often aids from added background knowledge in the form of gazetteers. However using such a collection does not deal with name variants and cannot resolve ambiguities associated in identifying the entities in context and associating them with predefined categories. We present a semi-supervised NER approach that starts with identifying named entities with a small set of training data. Using the identified named entities, the word and the context features are used to define the pattern. This pattern of each named entity category is used as a seed pattern to identify the named entities in the test set. Pattern scoring and tuple value score enables the generation of the new patterns to identify the named entity categories. We have evaluated the proposed system for English language with the dataset of tagged (IEER) and untagged (CoNLL 2003) named entity corpus and for Tamil language with the documents from the FIRE corpus and yield an average f-measure of 75% for both the languages.
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
Natural Language Processing (NLP) techniques are one of the most used techniques in the field of computer applications. It has become one of the vast and advanced techniques. Language is the means of communication or interaction among humans and in present scenario when everything is dependent on machine or everything is computerized, communication between computer and human has become a necessity. To fulfill this necessity NLP has been emerged as the means of interaction which narrows the gap between machines (computers) and humans. It was evolved from the study of linguistics which was passed through the Turing test to check the similarity between data but it was limited to small set of data. Later on various algorithms were developed along with the concept of AI (Artificial Intelligence) for the successful execution of NLP. In this paper, the main emphasis is on the different techniques of NLP which have been developed till now, their applications and the comparison of all those techniques on different parameters.
PERFORMANCE EVALUATION OF STATISTICAL CLASSIFIERS USING INDIAN SIGN LANGUAGE ...IJCSEA Journal
Sign language is the key for communication between deaf people. The significance of sign language is accentuated by various research activities and the technical aspects will definitely improve the communication needs. General view based sign language recognition systems extract manual parameters by a single camera view because it seems to be user friendly and hardware complexity; however it needs a high accuracy classifier for classification and recognition purpose. The decision making of the system in this work employs Indian sign language datasets and the performance evaluation of the system is compared by deploying the K-NN, Naïve Bayes and PNN classifiers. Classification using an instance-based classifiercan be a simple matter of locating the instance space and labelling the unknown instance with the same class label as that of the located (known) neighbour. Classifier always tries to improve the classification rate by pushing classifiers into an optimised structure. In each hand posture, a measure of properties like area, mean intensity, centroid, perimeter and diameter are taken; the classifier then uses these properties to determine the sign in different angles. They estimate the probability that a sign belongs to each of the target classes that is fixed. The impact of such study may reflect the exploration for using such algorithms
in other similar applications such as text classification and the development of automated systems.
A POS Tagger for Tamil Language”, Proceedings of the IJCNLP-2009, Suntec,
Singapore.
Dhanalakshmi V, Anand Kumar M, Soman K P and Rajendran S (2011), “Dependency
Parsing for Tamil using Malt Parser”, Proceedings of the International Conference on
Asian Language Processing (IALP), Bali, Indonesia.
Gimenez J and Marquez L (2004), “SVMTool: A general POS tagger generator based on
Support Vector Machines”, Proceedings of the 4th International Conference on Language
Resources and Evaluation (LREC 2004), Lisbon, Portugal.
Joakim Nivre and Johan Hall (
Review of research on devnagari character recognitionVikas Dongre
This document summarizes research on Devnagari character recognition. It begins with an abstract discussing the progress of English character recognition and the need for further research on Indian languages like Devnagari. The document then reviews the stages of Devnagari optical character recognition systems, including pre-processing, segmentation, feature extraction, recognition, and post-processing. It discusses challenges in Devnagari recognition due to features of the script like connected characters. The document also reviews common techniques used at each stage of recognition systems and provides directions for future research.
The document describes an OCR system for recognition of Urdu text written in Nastaliq font. It discusses the characteristics of Urdu script, existing approaches for cursive script recognition, and the methodology used in the system. The system employs two holistic approaches - a multi-tier approach using neural networks and a multi-stage classification approach combining multiple classifiers. Results show the 20 most frequent ligatures identified from analyzing BBC Urdu news text, and feature vectors extracted for segmentation.
The document discusses optical character recognition for Urdu handwriting. It introduces OCR and its applications. It then discusses earlier work on OCR systems that were font-specific. The document outlines the steps in OCR including image acquisition, preprocessing, segmentation, feature extraction, classification, and recognition. It provides an overview of the Urdu script and its variations. The document then summarizes research conducted on recognizing offline isolated Urdu characters using moment invariants and support vector machines. Other works discussed include an online and offline OCR system for Urdu using a segmentation-free approach, classifying Urdu ligatures using convolutional neural networks, and a segmentation-based approach for Urdu Nastaliq script recognition.
MODIFIED PAGE RANK ALGORITHM TO SOLVE AMBIGUITY OF POLYSEMOUS WORDSIJCI JOURNAL
The document proposes a Dynamic Page Rank algorithm to address the problem of polysemy, or multiple meanings, of words in information retrieval systems. It discusses how word sense ambiguity negatively impacts retrieval precision. The Dynamic Page Rank algorithm extends the traditional PageRank algorithm by incorporating word sense disambiguation to provide more accurate results tailored to the user's intended context. An experiment compares the proposed algorithm to PageRank and finds that it achieves a mean reciprocal rank of 1, indicating all top results were relevant, compared to 0.3167 for PageRank. The algorithm is presented as a way to improve information retrieval performance by resolving lexical ambiguity.
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODELijnlc
This document describes a tool called NERHMM for performing named entity recognition using hidden Markov models. The tool allows users to annotate raw text to create tagged corpora, train hidden Markov models on annotated data to calculate parameters, and test new text data to produce named entity tags. The tool works for multiple languages and can handle diverse tag sets. It provides a simple interface for tasks involved in named entity recognition like corpus development and parameter estimation for hidden Markov models. Evaluation on data shows the tool's performance increases with more training data.
Suitability of naïve bayesian methods for paragraph level text classification...ijaia
This document discusses using Naive Bayesian methods for paragraph-level text classification in the Kannada language. It evaluates the performance of the Naive Bayesian and Naive Bayesian Multinomial models on a corpus of 1791 paragraphs from four categories (Commerce, Social Sciences, Natural Sciences, Aesthetics). Dimensionality reduction techniques like removing stop words and words with low term frequency are applied before classification. The results show that the Naive Bayesian Multinomial model outperforms the simple Naive Bayesian approach for paragraph classification in Kannada.
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten DocumentsIRJET Journal
This document summarizes research on recognizing strike-out or crossed-out text in handwritten documents. It discusses 4 papers that studied this problem in different languages like English, Bengali, and Devanagari. The first paper used an LSTM model to recognize struck-out English text with 11% character error rate. The second used HMMs to recognize French text crossed out with lines or waves, achieving 46-92% accuracy. The third used graph algorithms and SVM to detect and remove Bengali strike-outs, obtaining 95% accuracy. The fourth used decision trees to classify English forensic documents, identifying 48% of crossed texts correctly. Overall the document reviews approaches to strike-out text recognition and removal.
This document proposes a generalized definition language for implementing an object-based fuzzy class model. It begins by reviewing related work on defining fuzzy classes and identifying limitations in existing approaches. It then summarizes the authors' previous work developing a generalized fuzzy class structure and model. The document introduces several new data types for representing different types of fuzzy attributes. Finally, it proposes a formal definition language for the fuzzy class model that utilizes the new data types to define fuzzy class structure and accurately represent fuzzy data types and attribute values. The language is intended to serve as a data definition language for object-based fuzzy database systems.
Resolving the semantics of vietnamese questions in v news qaict systemijaia
Recently we have built a VNewsQA/ICT system which can read the titles of Vietnamese news in the domain
of information and communication technology, then process and use them to answer the Vietnamese
questions of users. The architecture of VNewsQA/ICT system has two main components: 1) the first
component treats the simple Vietnamese sentences as its natural language textual data which is used to
answer the user’s questions; 2) the second component resolves the semantics of Vietnamese questions
which query the system. This paper introduces a semantic representation model and a processing model to
revolve the Vietnamese questions in VNewsQA/ICT system. These semantic representation and processing
models are able to resolve the semantics of eight Vietnamese question classes which are used in our
system.
TALASH: A SEMANTIC AND CONTEXT BASED OPTIMIZED HINDI SEARCH ENGINEIJCSEIT Journal
This document summarizes a research paper that proposes three models for query expansion in a Hindi search engine: 1) Using lexical resources like HindiWordNet to find synonyms and related terms, 2) Using user context information like location, interests and profession, 3) Combining lexical resources and user context. An experiment compares the precision of results from simple Google searches to searches using each model. Precision was highest using the combined Model III at 0.79, showing that integrating lexical and user context information improves search quality in Hindi.
A survey of named entity recognition in assamese and other indian languagesijnlc
Named Entity Recognition is always important when dealing with major Natural Language Processing
tasks such as information extraction, question-answering, machine translation, document summarization
etc so in this paper we put forward a survey of Named Entities in Indian Languages with particular
reference to Assamese. There are various rule-based and machine learning approaches available for
Named Entity Recognition. At the very first of the paper we give an idea of the available approaches for
Named Entity Recognition and then we discuss about the related research in this field. Assamese like other
Indian languages is agglutinative and suffers from lack of appropriate resources as Named Entity
Recognition requires large data sets, gazetteer list, dictionary etc and some useful feature like
capitalization as found in English cannot be found in Assamese. Apart from this we also describe some of
the issues faced in Assamese while doing Named Entity Recognition.
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGESijistjournal
Named Entity Recognition is a prior task in Natural Language Processing. Named Entity Recognition is a sub task of information extraction and it identifies and classifies proper nouns in to its predefined categories such as person, location, organization, time, date etc. In this document the major focus is given on NER approaches and the work done till now for various languages to identify Named Entities is been discussed. Author have done comparative study to recognize named entity and identified that CRF approach proven best for Indian languages to identify named entity.
Named Entity Recognition using Hidden Markov Model (HMM)kevig
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch
of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis,
natural language understanding, Information Extraction, Information retrieval, question answering etc.
The aim of NER is to classify words into some predefined categories like location name, person name,
organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based
approach of machine learning in detail to identify the named entities. The main idea behind the use of
HMM model for building NER system is that it is language independent and we can apply this system for
any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can
use it according to their interest. The corpus used by our NER system is also not domain specific.
Named Entity Recognition using Hidden Markov Model (HMM)kevig
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis, natural language understanding, Information Extraction, Information retrieval, question answering etc. The aim of NER is to classify words into some predefined categories like location name, person name, organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based approach of machine learning in detail to identify the named entities. The main idea behind the use of HMM model for building NER system is that it is language independent and we can apply this system for any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can use it according to their interest. The corpus used by our NER system is also not domain specific
Chunking means splitting the sentences into tokens and then grouping them in a meaningful way. When it comes to high-performance chunking systems, transformer models have proved to be the state of the art benchmarks. To perform chunking as a task it requires a large-scale high quality annotated corpus where each token is attached with a particular tag similar as that of Named Entity Recognition Tasks. Later these tags are used in conjunction with pointer frameworks to find the final chunk. To solve this for a specific domain problem, it becomes a highly costly affair in terms of time and resources to manually annotate and produce a large-high-quality training set. When the domain is specific and diverse, then cold starting becomes even more difficult because of the expected large number of manually annotated queries to cover all aspects. To overcome the problem, we applied a grammar-based text generation mechanism where instead of annotating a sentence we annotate using grammar templates. We defined various templates corresponding to different grammar rules. To create a sentence we used these templates along with the rules where symbol or terminal values were chosen from the domain data catalog. It helped us to create a large number of annotated queries. These annotated queries were used for training the machine learning model using an ensemble transformer-based deep neural network model [24.] We found that grammar-based annotation was useful to solve domain-based chunks in input query sentences without any manual annotation where it was found to achieve a classification F1 score of 96.97% in classifying the tokens for the out of template queries.
This document discusses fuzzy querying of relational databases. It begins by introducing fuzzy relational database management systems (FRDBMS), which allow imprecise queries using fuzzy logic. It then presents the basic concepts of fuzzy logic and membership functions. The architecture of an FRDBMS is described, including how it translates fuzzy queries into equivalent SQL queries. An example student database is used to demonstrate a fuzzy query for "poor performers" and how it returns more graded results than an exact SQL query. The document concludes that FRDBMS improves the expressiveness of queries over traditional databases.
Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compoundsidescitation
This paper presents a supervised machine learning
approach that uses a decision tree learning algorithm for
recognition of Bengali noun-noun compounds as multiword
expression (M WE) from Bengali corpus. Our proposed
approach to MWE recognition has two steps: (1) extraction of
candidate multi-word expressions using chunk information
and various heuristic rules and (2) training the machine
learning algorithm to recognize a candidate multi-word
expression as Multi-word expression or not. A variety of
association measures have been used as features for
identifying MWEs. The proposed system is tested on a Bengali
corpus for identifying noun-noun compound MWEs from the
corpus.
Semantic based automatic question generation using artificial immune systemAlexander Decker
The document describes a system that uses artificial immune systems and natural language processing techniques like semantic role labeling and named entity recognition to automatically generate questions from text. It introduces a model that applies these techniques to extract semantic patterns from sentences, trains a classifier using artificial immune systems to classify question types, and then generates questions by matching patterns. The system was tested on sentences from various sources and showed promising results, correctly determining question types 95% of the time and generating matching questions 87% of the time.
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITIONkevig
The aim of Named Entity Recognition (NER) is to identify references of named entities in unstructured documents, and to classify them into pre-defined semantic categories. NER often aids from added background knowledge in the form of gazetteers. However using such a collection does not deal with name variants and cannot resolve ambiguities associated in identifying the entities in context and associating them with predefined categories. We present a semi-supervised NER approach that starts with identifying named entities with a small set of training data. Using the identified named entities, the word and the context features are used to define the pattern. This pattern of each named entity category is used as a seed pattern to identify the named entities in the test set. Pattern scoring and tuple value score enables the generation of the new patterns to identify the named entity categories. We have evaluated the proposed system for English language with the dataset of tagged (IEER) and untagged (CoNLL 2003) named entity corpus and for Tamil language with the documents from the FIRE corpus and yield an average f-measure of 75% for both the languages.
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
Natural Language Processing (NLP) techniques are one of the most used techniques in the field of computer applications. It has become one of the vast and advanced techniques. Language is the means of communication or interaction among humans and in present scenario when everything is dependent on machine or everything is computerized, communication between computer and human has become a necessity. To fulfill this necessity NLP has been emerged as the means of interaction which narrows the gap between machines (computers) and humans. It was evolved from the study of linguistics which was passed through the Turing test to check the similarity between data but it was limited to small set of data. Later on various algorithms were developed along with the concept of AI (Artificial Intelligence) for the successful execution of NLP. In this paper, the main emphasis is on the different techniques of NLP which have been developed till now, their applications and the comparison of all those techniques on different parameters.
PERFORMANCE EVALUATION OF STATISTICAL CLASSIFIERS USING INDIAN SIGN LANGUAGE ...IJCSEA Journal
Sign language is the key for communication between deaf people. The significance of sign language is accentuated by various research activities and the technical aspects will definitely improve the communication needs. General view based sign language recognition systems extract manual parameters by a single camera view because it seems to be user friendly and hardware complexity; however it needs a high accuracy classifier for classification and recognition purpose. The decision making of the system in this work employs Indian sign language datasets and the performance evaluation of the system is compared by deploying the K-NN, Naïve Bayes and PNN classifiers. Classification using an instance-based classifiercan be a simple matter of locating the instance space and labelling the unknown instance with the same class label as that of the located (known) neighbour. Classifier always tries to improve the classification rate by pushing classifiers into an optimised structure. In each hand posture, a measure of properties like area, mean intensity, centroid, perimeter and diameter are taken; the classifier then uses these properties to determine the sign in different angles. They estimate the probability that a sign belongs to each of the target classes that is fixed. The impact of such study may reflect the exploration for using such algorithms
in other similar applications such as text classification and the development of automated systems.
A POS Tagger for Tamil Language”, Proceedings of the IJCNLP-2009, Suntec,
Singapore.
Dhanalakshmi V, Anand Kumar M, Soman K P and Rajendran S (2011), “Dependency
Parsing for Tamil using Malt Parser”, Proceedings of the International Conference on
Asian Language Processing (IALP), Bali, Indonesia.
Gimenez J and Marquez L (2004), “SVMTool: A general POS tagger generator based on
Support Vector Machines”, Proceedings of the 4th International Conference on Language
Resources and Evaluation (LREC 2004), Lisbon, Portugal.
Joakim Nivre and Johan Hall (
Review of research on devnagari character recognitionVikas Dongre
This document summarizes research on Devnagari character recognition. It begins with an abstract discussing the progress of English character recognition and the need for further research on Indian languages like Devnagari. The document then reviews the stages of Devnagari optical character recognition systems, including pre-processing, segmentation, feature extraction, recognition, and post-processing. It discusses challenges in Devnagari recognition due to features of the script like connected characters. The document also reviews common techniques used at each stage of recognition systems and provides directions for future research.
The document describes an OCR system for recognition of Urdu text written in Nastaliq font. It discusses the characteristics of Urdu script, existing approaches for cursive script recognition, and the methodology used in the system. The system employs two holistic approaches - a multi-tier approach using neural networks and a multi-stage classification approach combining multiple classifiers. Results show the 20 most frequent ligatures identified from analyzing BBC Urdu news text, and feature vectors extracted for segmentation.
The document discusses optical character recognition for Urdu handwriting. It introduces OCR and its applications. It then discusses earlier work on OCR systems that were font-specific. The document outlines the steps in OCR including image acquisition, preprocessing, segmentation, feature extraction, classification, and recognition. It provides an overview of the Urdu script and its variations. The document then summarizes research conducted on recognizing offline isolated Urdu characters using moment invariants and support vector machines. Other works discussed include an online and offline OCR system for Urdu using a segmentation-free approach, classifying Urdu ligatures using convolutional neural networks, and a segmentation-based approach for Urdu Nastaliq script recognition.
MODIFIED PAGE RANK ALGORITHM TO SOLVE AMBIGUITY OF POLYSEMOUS WORDSIJCI JOURNAL
The document proposes a Dynamic Page Rank algorithm to address the problem of polysemy, or multiple meanings, of words in information retrieval systems. It discusses how word sense ambiguity negatively impacts retrieval precision. The Dynamic Page Rank algorithm extends the traditional PageRank algorithm by incorporating word sense disambiguation to provide more accurate results tailored to the user's intended context. An experiment compares the proposed algorithm to PageRank and finds that it achieves a mean reciprocal rank of 1, indicating all top results were relevant, compared to 0.3167 for PageRank. The algorithm is presented as a way to improve information retrieval performance by resolving lexical ambiguity.
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODELijnlc
This document describes a tool called NERHMM for performing named entity recognition using hidden Markov models. The tool allows users to annotate raw text to create tagged corpora, train hidden Markov models on annotated data to calculate parameters, and test new text data to produce named entity tags. The tool works for multiple languages and can handle diverse tag sets. It provides a simple interface for tasks involved in named entity recognition like corpus development and parameter estimation for hidden Markov models. Evaluation on data shows the tool's performance increases with more training data.
Suitability of naïve bayesian methods for paragraph level text classification...ijaia
This document discusses using Naive Bayesian methods for paragraph-level text classification in the Kannada language. It evaluates the performance of the Naive Bayesian and Naive Bayesian Multinomial models on a corpus of 1791 paragraphs from four categories (Commerce, Social Sciences, Natural Sciences, Aesthetics). Dimensionality reduction techniques like removing stop words and words with low term frequency are applied before classification. The results show that the Naive Bayesian Multinomial model outperforms the simple Naive Bayesian approach for paragraph classification in Kannada.
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten DocumentsIRJET Journal
This document summarizes research on recognizing strike-out or crossed-out text in handwritten documents. It discusses 4 papers that studied this problem in different languages like English, Bengali, and Devanagari. The first paper used an LSTM model to recognize struck-out English text with 11% character error rate. The second used HMMs to recognize French text crossed out with lines or waves, achieving 46-92% accuracy. The third used graph algorithms and SVM to detect and remove Bengali strike-outs, obtaining 95% accuracy. The fourth used decision trees to classify English forensic documents, identifying 48% of crossed texts correctly. Overall the document reviews approaches to strike-out text recognition and removal.
This document proposes a generalized definition language for implementing an object-based fuzzy class model. It begins by reviewing related work on defining fuzzy classes and identifying limitations in existing approaches. It then summarizes the authors' previous work developing a generalized fuzzy class structure and model. The document introduces several new data types for representing different types of fuzzy attributes. Finally, it proposes a formal definition language for the fuzzy class model that utilizes the new data types to define fuzzy class structure and accurately represent fuzzy data types and attribute values. The language is intended to serve as a data definition language for object-based fuzzy database systems.
Resolving the semantics of vietnamese questions in v news qaict systemijaia
Recently we have built a VNewsQA/ICT system which can read the titles of Vietnamese news in the domain
of information and communication technology, then process and use them to answer the Vietnamese
questions of users. The architecture of VNewsQA/ICT system has two main components: 1) the first
component treats the simple Vietnamese sentences as its natural language textual data which is used to
answer the user’s questions; 2) the second component resolves the semantics of Vietnamese questions
which query the system. This paper introduces a semantic representation model and a processing model to
revolve the Vietnamese questions in VNewsQA/ICT system. These semantic representation and processing
models are able to resolve the semantics of eight Vietnamese question classes which are used in our
system.
TALASH: A SEMANTIC AND CONTEXT BASED OPTIMIZED HINDI SEARCH ENGINEIJCSEIT Journal
This document summarizes a research paper that proposes three models for query expansion in a Hindi search engine: 1) Using lexical resources like HindiWordNet to find synonyms and related terms, 2) Using user context information like location, interests and profession, 3) Combining lexical resources and user context. An experiment compares the precision of results from simple Google searches to searches using each model. Precision was highest using the combined Model III at 0.79, showing that integrating lexical and user context information improves search quality in Hindi.
A survey of named entity recognition in assamese and other indian languagesijnlc
Named Entity Recognition is always important when dealing with major Natural Language Processing
tasks such as information extraction, question-answering, machine translation, document summarization
etc so in this paper we put forward a survey of Named Entities in Indian Languages with particular
reference to Assamese. There are various rule-based and machine learning approaches available for
Named Entity Recognition. At the very first of the paper we give an idea of the available approaches for
Named Entity Recognition and then we discuss about the related research in this field. Assamese like other
Indian languages is agglutinative and suffers from lack of appropriate resources as Named Entity
Recognition requires large data sets, gazetteer list, dictionary etc and some useful feature like
capitalization as found in English cannot be found in Assamese. Apart from this we also describe some of
the issues faced in Assamese while doing Named Entity Recognition.
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGESijistjournal
Named Entity Recognition is a prior task in Natural Language Processing. Named Entity Recognition is a sub task of information extraction and it identifies and classifies proper nouns in to its predefined categories such as person, location, organization, time, date etc. In this document the major focus is given on NER approaches and the work done till now for various languages to identify Named Entities is been discussed. Author have done comparative study to recognize named entity and identified that CRF approach proven best for Indian languages to identify named entity.
Named Entity Recognition using Hidden Markov Model (HMM)kevig
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch
of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis,
natural language understanding, Information Extraction, Information retrieval, question answering etc.
The aim of NER is to classify words into some predefined categories like location name, person name,
organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based
approach of machine learning in detail to identify the named entities. The main idea behind the use of
HMM model for building NER system is that it is language independent and we can apply this system for
any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can
use it according to their interest. The corpus used by our NER system is also not domain specific.
Named Entity Recognition using Hidden Markov Model (HMM)kevig
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis, natural language understanding, Information Extraction, Information retrieval, question answering etc. The aim of NER is to classify words into some predefined categories like location name, person name, organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based approach of machine learning in detail to identify the named entities. The main idea behind the use of HMM model for building NER system is that it is language independent and we can apply this system for any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can use it according to their interest. The corpus used by our NER system is also not domain specific
Named Entity Recognition using Hidden Markov Model (HMM)kevig
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis, natural language understanding, Information Extraction, Information retrieval, question answering etc. The aim of NER is to classify words into some predefined categories like location name, person name,
organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based approach of machine learning in detail to identify the named entities. The main idea behind the use of HMM model for building NER system is that it is language independent and we can apply this system for
any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can use it according to their interest. The corpus used by our NER system is also not domain specific.
This document describes a system for extracting named entities and their relationships from unstructured text data using n-gram features. It uses a hidden Markov model to extract and classify entities into types like person, location, organization. It then uses a conditional random field with kernel approach to detect relationships between the extracted entities. The system takes unstructured text as input, performs preprocessing like tokenization and stop word removal, extracts n-gram, part-of-speech and lexicon features which are then combined and used to train the HMM model to classify entities and CRF model to detect relationships between entities.
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemIRJET Journal
This document proposes a knowledge graph and question answering system to extract and analyze information from large volumes of unstructured data like annual reports. It discusses using natural language processing techniques like named entity recognition with spaCy and dependency parsing to extract entity-relation pairs from text and construct a knowledge graph. For question answering, it analyzes user queries with similar NLP approaches and then matches query triplets to the knowledge graph to retrieve answers, combining information retrieval and trained classifiers. The proposed system aims to provide faster understanding and analysis of complex, unstructured data for professionals.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...kevig
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is
also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
IRJET- Survey for Amazon Fine Food ReviewsIRJET Journal
This document discusses sentiment analysis and summarizes several papers on related topics. It begins with an abstract describing sentiment analysis and its importance. The introduction defines sentiment classification and analysis. The literature survey section summarizes 5 papers on natural language processing and machine learning algorithms for sentiment analysis, including K-means clustering, bag-of-words models, TF-IDF vectorization for document clustering, hierarchical clustering methods, and using naive bayes and SVM for sentiment analysis and text summarization. The conclusion discusses techniques for data processing, natural language processing, and machine learning algorithms covered.
This document presents a system for extracting named entities and their relationships from unstructured text data using n-gram features with hidden Markov models and conditional random fields. The system first extracts n-gram, part-of-speech, and lexicon features from documents, then trains a hidden Markov model to classify entities and a conditional random field with kernel approach to detect relationships between entities. Evaluation shows the proposed system achieves 98.03% accuracy, 88.80% precision, and 87.50% recall for entity detection, outperforming a support vector machine baseline. For relationship extraction, it achieves 87.46% accuracy, 84.46% precision, and 82.46% recall, again outperforming the SVM baseline.
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Modelkevig
Named Entity Recognition (NER) is considered as one of the key task in the field of Information Retrieval.
NER is the method of recognizing Named Entities (NEs) in a corpus and then organizing these NEs into
diverse classes of NEs e.g. Name of Location, Person, Organization, Quantity, Time, Percentage etc.
Today, there is a great need to develop a tool for NER, since the existing tools are of limited scope. In this
paper, we would discuss the functionality and features of our tool of NER with some experimental results.
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...home
Named entities are the most informative element of a textual document and identification of the names is very much
important for extracting further information from text. We have developed a conditional random field based system to
identify the named entities from homeopathic diagnosis discussion forum text. We have manually annotated a training
corpus for the task. As manual creation of a sufficiently large annotated corpus is costly and time consuming, we use an
active learning based semi-supervised framework to increase the efficiency of the system with the help of un-annotated
data. Our system achieves the highest f-value of
The job market has expanded exponentially in the past few years. With many recruiters and candidates, it
is not an easy task to match a perfect candidate with a perfect job. The recruiter targets candidates with
the required skill sets mentioned in the job descriptions, while candidates target their dream jobs. The
search frictions and skills mismatch are persistent problems. In this paper, we build a model that would
match companies with candidates with the right skills and workers with the right company. We have
further developed an algorithm to investigate people’s hiring history for better results.
International Journal on Soft Computing, Artificial Intelligence and Applicat...ijscai
The document describes a study that developed a model to match jobs to candidates using artificial intelligence. It used job descriptions and titles classified according to Thailand's job classification systems (TSCO and TSIC) to train a stochastic gradient descent classifier. The classifier aims to take in a new job description and output the appropriate TSCO classification code. It also developed an algorithm called "Hiring History" to incorporate a person's previous hiring details into the matching.
The job market has expanded exponentially in the past few years. With many recruiters and candidates, it
is not an easy task to match a perfect candidate with a perfect job. The recruiter targets candidates with
the required skill sets mentioned in the job descriptions, while candidates target their dream jobs. The
search frictions and skills mismatch are persistent problems. In this paper, we build a model that would
match companies with candidates with the right skills and workers with the right company. We have
further developed an algorithm to investigate people’s hiring history for better results
The job market has expanded exponentially in the past few years. With many recruiters and candidates, it
is not an easy task to match a perfect candidate with a perfect job. The recruiter targets candidates with
the required skill sets mentioned in the job descriptions, while candidates target their dream jobs. The
search frictions and skills mismatch are persistent problems. In this paper, we build a model that would
match companies with candidates with the right skills and workers with the right company. We have
further developed an algorithm to investigate people’s hiring history for better results.
Job matching
Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from
large amounts of data. The important term in data mining is text mining. Text mining extracts the quality
information highly from text. Statistical pattern learning is used to high quality information. High –quality in
text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text
categorization, text clustering, entity extraction and sentiment analysis. Applications of natural language
processing and analytical methods are highly preferred to turn
The document summarizes text mining techniques in data mining. It discusses common text mining tasks like text categorization, clustering, and entity extraction. It also reviews several text mining algorithms and techniques, including information extraction, clustering, classification, and information visualization. Several literature papers applying these techniques to domains like movie reviews, research proposals, and e-commerce are also summarized. The document concludes that text mining can extract useful patterns from unstructured text through techniques like clustering, classification, and information extraction.
Similar to HINDI NAMED ENTITY RECOGNITION BY AGGREGATING RULE BASED HEURISTICS AND HIDDEN MARKOV MODEL (20)
Call for Papers - 5th International Conference on Cloud, Big Data and IoT (CB...ijistjournal
5th International Conference on Cloud, Big Data and IoT (CBIoT 2024) will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the areas of Cloud, Big Data and IoT. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of Cloud, Big Data and IoT.
Authors are solicited to contribute to the conference by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Cloud, Big Data and IoT.
PERFORMANCE ANALYSIS OF PARALLEL IMPLEMENTATION OF ADVANCED ENCRYPTION STANDA...ijistjournal
Cryptography is the study of mathematical techniques related to aspects of information security such as confidentiality, data integrity, entity authentication, and data origin authentication. Most cryptographic algorithms function more efficiently when implemented in hardware than in software running on single processor. However, systems that use hardware implementations have significant drawbacks: they are unable to respond to flaws discovered in the implemented algorithm or to changes in standards. As an alternative, it is possible to implement cryptographic algorithms in software running on multiple processors. However, most of the cryptographic algorithms like DES (Data Encryption Standard) or 3DES have some drawbacks when implemented in software: DES is no longer secure as computers get more powerful while 3DES is relatively sluggish in software. AES (Advanced Encryption Standard), which is rapidly being adopted worldwide, provides a better combination of performance and enhanced network security than DES or 3DES by being computationally more efficient than these earlier standards. Furthermore, by supporting large key sizes of 128, 192, and 256 bits, AES offers higher security against brute-force attacks.
In this paper, AES has been implemented with single processor. Then the result has been compared with parallel implementations of AES with 2 varying different parameters such as key size, number of rounds and extended key size, and show how parallel implementation of the AES offers better performance yet flexible enough for cryptographic algorithms.
Submit Your Research Articles - International Journal of Information Sciences...ijistjournal
The International Journal of Information Science & Techniques (IJIST) focuses on information systems science and technology coercing multitude applications of information systems in business administration, social science, biosciences, and humanities education, library sciences management, depiction of data and structural illustration, big data analytics, information economics in real engineering and scientific problems.
This journal provides a forum that impacts the development of engineering, education, technology management, information theories and application validation. It also acts as a path to exchange novel and innovative ideas about Information systems science and technology.
INFORMATION THEORY BASED ANALYSIS FOR UNDERSTANDING THE REGULATION OF HLA GEN...ijistjournal
Considering information entropy (IE), HLA surface expression (SE) regulation phenomenon is considered as information propagation channel with an amount of distortion. HLA gene SE is considered as sink regulated by the inducible transcription factors (TFs) (source). Previous work with a certain number of bin size, IEs for source and receiver is computed and computation of mutual information characterizes the dependencies of HLA gene SE on some certain TFs in different cells types of hematopoietic system under the condition of leukemia. Though in recent time information theory is utilized for different biological knowledge generation and different rules are available in those specific domains of biomedical areas; however, no such attempt is made regarding gene expression regulation, hence no such rule is available. In this work, IE calculation with varying bin size considering the number of bins is approximately half of the sample size of an attribute also confirms the previous inferences.
Call for Research Articles - 5th International Conference on Artificial Intel...ijistjournal
5th International Conference on Artificial Intelligence and Machine Learning (CAIML 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Artificial Intelligence and Machine Learning. The Conference looks for significant contributions to all major fields of the Artificial Intelligence, Machine Learning in theoretical and practical aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
Authors are solicited to contribute to the conference by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the areas of Computer Science, Engineering and Applications.
Online Paper Submission - International Journal of Information Sciences and T...ijistjournal
The International Journal of Information Science & Techniques (IJIST) focuses on information systems science and technology coercing multitude applications of information systems in business administration, social science, biosciences, and humanities education, library sciences management, depiction of data and structural illustration, big data analytics, information economics in real engineering and scientific problems.
This journal provides a forum that impacts the development of engineering, education, technology management, information theories and application validation. It also acts as a path to exchange novel and innovative ideas about Information systems science and technology.
SYSTEM IDENTIFICATION AND MODELING FOR INTERACTING AND NON-INTERACTING TANK S...ijistjournal
System identification from the experimental data plays a vital role for model based controller design. Derivation of process model from first principles is often difficult due to its complexity. The first stage in the development of any control and monitoring system is the identification and modeling of the system. Each model is developed within the context of a specific control problem. Thus, the need for a general system identification framework is warranted. The proposed framework should be able to adapt and emphasize different properties based on the control objective and the nature of the behavior of the system. Therefore, system identification has been a valuable tool in identifying the model of the system based on the input and output data for the design of the controller. The present work is concerned with the identification of transfer function models using statistical model identification, process reaction curve method, ARX model, genetic algorithm and modeling using neural network and fuzzy logic for interacting and non interacting tank process. The identification technique and modeling used is prone to parameter change & disturbance. The proposed methods are used for identifying the mathematical model and intelligent model of interacting and non interacting process from the real time experimental data.
Call for Research Articles - 4th International Conference on NLP & Data Minin...ijistjournal
4th International Conference on NLP & Data Mining (NLDM 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Natural Language Computing and Data Mining.
Authors are solicited to contribute to the conference by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the following areas, but are not limited to.
Research Article Submission - International Journal of Information Sciences a...ijistjournal
The International Journal of Information Science & Techniques (IJIST) focuses on information systems science and technology coercing multitude applications of information systems in business administration, social science, biosciences, and humanities education, library sciences management, depiction of data and structural illustration, big data analytics, information economics in real engineering and scientific problems.
This journal provides a forum that impacts the development of engineering, education, technology management, information theories and application validation. It also acts as a path to exchange novel and innovative ideas about Information systems science and technology.
Call for Papers - International Journal of Information Sciences and Technique...ijistjournal
The International Journal of Information Science & Techniques (IJIST) focuses on information systems science and technology coercing multitude applications of information systems in business administration, social science, biosciences, and humanities education, library sciences management, depiction of data and structural illustration, big data analytics, information economics in real engineering and scientific problems.
This journal provides a forum that impacts the development of engineering, education, technology management, information theories and application validation. It also acts as a path to exchange novel and innovative ideas about Information systems science and technology.
Implementation of Radon Transformation for Electrical Impedance Tomography (EIT)ijistjournal
Radon Transformation is generally used to construct optical image (like CT image) from the projection data in biomedical imaging. In this paper, the concept of Radon Transformation is implemented to reconstruct Electrical Impedance Topographic Image (conductivity or resistivity distribution) of a circular subject. A parallel resistance model of a subject is proposed for Electrical Impedance Topography(EIT) or Magnetic Induction Tomography(MIT). A circular subject with embedded circular objects is segmented into equal width slices from different angles. For each angle, Conductance and Conductivity of each slice is calculated and stored in an array. A back projection method is used to generate a two-dimensional image from one-dimensional projections. As a back projection method, Inverse Radon Transformation is applied on the calculated conductance and conductivity to reconstruct two dimensional images. These images are compared to the target image. In the time of image reconstruction, different filters are used and these images are compared with each other and target image.
Online Paper Submission - 6th International Conference on Machine Learning & ...ijistjournal
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Authors are solicited to contribute to the conference by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the following areas, but are not limited to.
Submit Your Research Articles - International Journal of Information Sciences...ijistjournal
The International Journal of Information Science & Techniques (IJIST) focuses on information systems science and technology coercing multitude applications of information systems in business administration, social science, biosciences, and humanities education, library sciences management, depiction of data and structural illustration, big data analytics, information economics in real engineering and scientific problems.
This journal provides a forum that impacts the development of engineering, education, technology management, information theories and application validation. It also acts as a path to exchange novel and innovative ideas about Information systems science and technology.
BER Performance of MPSK and MQAM in 2x2 Almouti MIMO Systemsijistjournal
Almouti published the error performance of the 2x2 space-time transmit diversity scheme using BPSK. One of the key techniques employed for correcting such errors is the Quadrature amplitude modulation (QAM) because of its efficiency in power and bandwidth.. In this paper we explore the error performance of the 2x2 MIMO system using the Almouti space-time codes for higher order PSK and M-ary QAM. MATLAB was used to simulate the system; assuming slow fading Rayleigh channel and additive white Gaussian noise. The simulated performance curves were compared and evaluated with theoretical curves obtained using BER tool on the MATLAB by setting parameters for random generators. The results shows that the technique used do find a place in correcting error rates of QAM system of higher modulation schemes. The model can equally be used not only for the criteria of adaptive modulation but for a platform to design other modulation systems as well.
Online Paper Submission - International Journal of Information Sciences and T...ijistjournal
The International Journal of Information Science & Techniques (IJIST) focuses on information systems science and technology coercing multitude applications of information systems in business administration, social science, biosciences, and humanities education, library sciences management, depiction of data and structural illustration, big data analytics, information economics in real engineering and scientific problems.
This journal provides a forum that impacts the development of engineering, education, technology management, information theories and application validation. It also acts as a path to exchange novel and innovative ideas about Information systems science and technology.
Call for Papers - International Journal of Information Sciences and Technique...ijistjournal
The International Journal of Information Science & Techniques (IJIST) focuses on information systems science and technology coercing multitude applications of information systems in business administration, social science, biosciences, and humanities education, library sciences management, depiction of data and structural illustration, big data analytics, information economics in real engineering and scientific problems.
This journal provides a forum that impacts the development of engineering, education, technology management, information theories and application validation. It also acts as a path to exchange novel and innovative ideas about Information systems science and technology.
International Journal of Information Sciences and Techniques (IJIST)ijistjournal
The International Journal of Information Science & Techniques (IJIST) focuses on information systems science and technology coercing multitude applications of information systems in business administration, social science, biosciences, and humanities education, library sciences management, depiction of data and structural illustration, big data analytics, information economics in real engineering and scientific problems.
This journal provides a forum that impacts the development of engineering, education, technology management, information theories and application validation. It also acts as a path to exchange novel and innovative ideas about Information systems science and technology.
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...ijistjournal
Feature extraction is a method of capturing visual content of an image. The feature extraction is the process to represent raw image in its reduced form to facilitate decision making such as pattern classification. We have tried to address the problem of classification MRI brain images by creating a robust and more accurate classifier which can act as an expert assistant to medical practitioners. The objective of this paper is to present a novel method of feature selection and extraction. This approach combines the Intensity, Texture, shape based features and classifies the tumor as white matter, Gray matter, CSF, abnormal and normal area. The experiment is performed on 140 tumor contained brain MR images from the Internet Brain Segmentation Repository. The proposed technique has been carried out over a larger database as compare to any previous work and is more robust and effective. PCA and Linear Discriminant Analysis (LDA) were applied on the training sets. The Support Vector Machine (SVM) classifier served as a comparison of nonlinear techniques Vs linear ones. PCA and LDA methods are used to reduce the number of features used. The feature selection using the proposed technique is more beneficial as it analyses the data according to grouping class variable and gives reduced feature set with high classification accuracy.
Research Article Submission - International Journal of Information Sciences a...ijistjournal
The International Journal of Information Science & Techniques (IJIST) focuses on information systems science and technology coercing multitude applications of information systems in business administration, social science, biosciences, and humanities education, library sciences management, depiction of data and structural illustration, big data analytics, information economics in real engineering and scientific problems.
This journal provides a forum that impacts the development of engineering, education, technology management, information theories and application validation. It also acts as a path to exchange novel and innovative ideas about Information systems science and technology.
A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVINijistjournal
In this paper A Median Based Directional Cascaded with Mask (MBDCM) filter has been proposed, which is based on three different sized cascaded filtering windows. The differences between the current pixel and its neighbors aligned with four main directions are considered for impulse detection. A direction index is used for each edge aligned with a given direction. Minimum of these four direction indexes is used for impulse detection under each masking window. Depending on the minimum direction indexes among these three windows new value to substitute the noisy pixel is calculated. Extensive simulations showed that the MBDCM filter provides good performances of suppressing impulses from both gray level and colored benchmarked images corrupted with low noise level as well as for highly dense impulses. MBDCM filter gives better results than MDWCMM filter in suppressing impulses from highly corrupted digital images.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
HINDI NAMED ENTITY RECOGNITION BY AGGREGATING RULE BASED HEURISTICS AND HIDDEN MARKOV MODEL
1. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
DOI : 10.5121/ijist.2012.2604 43
HINDI NAMED ENTITY RECOGNITION BY
AGGREGATING RULE BASED HEURISTICS AND
HIDDEN MARKOV MODEL
Deepti Chopra, Nusrat Jahan, Sudha Morwal
Department of Computer Engineering, Banasthali Vidyapith Jaipur (Raj.), INDIA
deeptichopra11@yahoo.co.in
nusratkota@gmail.com
sudha_morwal@yahoo.co.in
ABSTRACT
Named entity recognition (NER) is one of the applications of Natural Language Processing and is regarded
as the subtask of information retrieval. NER is the process to detect Named Entities (NEs) in a document
and to categorize them into certain Named entity classes such as the name of organization, person,
location, sport, river, city, country, quantity etc. In English, we have accomplished lot of work related to
NER. But, at present, still we have not been able to achieve much of the success pertaining to NER in the
Indian languages. The following paper discusses about NER, the various approaches of NER, Performance
Metrics, the challenges in NER in the Indian languages and finally some of the results that have been
achieved by performing NER in Hindi by aggregating approaches such as Rule based heuristics and
Hidden Markov Model (HMM).
KEYWORDS
HMM, Accuracy, NER, Performance Metrics, Named Entities
1. INTRODUCTION
There are numerous applications of Named Entity Recognition (NER).Some of these include:
Information Extraction, Question Answering, Information Retrieval, Automatic Summarization,
Machine Translation etc. The Named Entities can be known to us, if we perform computations on
the natural language. The task of extracting necessary details and retrieving important information
can be made easier and faster, if the Named entities are already known to us. NER is the process
in which Named Entities are detected in a document and are classified into their respective
Named Entity classes using any of the NER based approaches. According to the 8th
schedule,
India is known to have 22 official Indian languages. NER in Indian languages is still considered
to be a budding topic of research in the field of NLP and much of work is needed to be performed
in this regard.
Consider an example of NER in Hindi as follows:
“Mohit/PER ne/O mi road/LOC se/O kitab/O khareedi/O |/O
In the above sentence, the task of a NER based system is to extract and then classify the named
entities into certain classes. Here, we have considered ‘Mohit’ as the name of a person, so it is
2. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
44
shown by a PER tag. ‘mi road’ is the name of a location, so we have allotted a LOC tag to it. The
named entity tags that we choose may vary every time. It depends on the individual choice and
the contents that we have considered for the Named Entity Recognition. TABLE I lists some of
the Named Entity Tags. Named Entity tags may be of the general type or may further be
divided into sub tags which are of specific types. E.g. location tag (LOC) may further be
classified into continent tag, country tag, city tag, state tag, town tag, street tag etc.
Figure1. A single Named Entity tag split into more specific Named Entity tags
Table 1
Various Named Entity Tags. NE Tags: Named Entity Tags
PER: Name of Person, CO-Country, ORG-Organization, VEH-vehicle and QTY-Quantity
NE TAG EXAMPLE
PER Deepti, Sudha, Rohit
CITY Jaipur, Mumbai, Kolkata
CO India, China, Pakistan
STATE Rajasthan, Maharashtra
SPORT Hockey, Badminton
ORG TCS, Infosys, Accenture
RIVER Ganga, Krishna, kaveri
DATE 27-04-2012, 31/01/1989
TIME 10:10
PERCENT 100%
2. METHODOLOGIES FOR NER
There are basically two approaches that are employed in Named Entity Recognition. [5] [1] [18]
These include: Rule Based Approach and Machine learning based Approach [11] [6] [16].
2.1. Rule based Approach
It is also known as handcrafted approach. It is of two types:
LOCATION
CONTINENT
COUNTRY
STATE CITY
TOWN
STREET
T
3. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
45
2.1.1 List Lookup Approach
In this approach, Gazetteers are used that consists of different lists of Named Entity
classes and a simple look up operation is performed to conclude whether a word is a
Named Entity or not. If a particular word is found in a Named Entity class, then a Named
Entity tag is allotted to that word according to the Named Entity class in which it is
found. Indian languages lack in resources.
We can prepare Gazetteers in Indian languages using transliteration that would convert
English Named Entities into Indian languages. Some seed values of a domain specific
corpus can be used that would learn the context patterns and then Named Entities are
produced by the concept of bootstrapping.[17]This methodology is easy and fast .The
disadvantage of this approach is that it cannot overcome the problem of ambiguities.
E.g In a sentence:-““Ganga/PER Ne/O Ganga/RIVER nadi/O mein/O dupki/O Lagayi/O
|/O””. In this sentence, Ganga is a Named Entity .But, it can be a person name or a river
name .The ambiguity cannot be resolved by this methodology.
2.1.2. Linguistic Approach
In this approach, a linguist, who has an in depth knowledge about the grammar of specific
language constructs some rules, so that the Named Entities can be recognized as well as classified
easily. [3][20][19]The rules that are constructed are language independent and cannot be used to
identify Named Entities in some other language. [11]
2.2. Machine Learning Based Approach
This approach is also known as automated approach or Statistical approach. Machine learning
based approach is more efficiently and frequently used as compared to the Rule based approach.
2.2.1. Hidden Markov Model (HMM)
HMM is a statistical based approach in which states are hidden or unobserved .The HMM
produces sequence of tokens that are nothing but optimal state sequence.
It is based on the Markov Chain Property i.e. the probability of occurrence of the next state is
dependent on the just previous state. HMM is easy to implement. The disadvantage of this
approach is that it requires lot of training in order to get better results and it cannot be used for
large dependencies. [12]
Figure 2: Diagrammatic description of HMM
4. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
46
2.2.2. Maximum Entropy Markov Model (MEMM)
It combines the concept of Hidden Markov Model and Maximum Entropy Model. While training,
this model makes sure that the unknown values in a Markov Chain are connected and are not
conditionally independent of each other.
The large dependency problem of HMM is resolved by this model. Also, it has higher recall and
precision as compared to HMM. The disadvantage of this approach is the label bias problem. The
probabilities of transition from a particular state must sum to one. MEMM favours those states
through which less number of transitions occurs. [16]
Figure 3: Diagrammatic description of MEMM
2.2.3. Conditional Random Field (CRF)
It is graphical undirected model .Unlike other classifiers, it also takes into consideration the
context information or the neighbouring samples. It is known as Random field since it computes
the conditional probability on the following node given the present node values.
This methodology has advantages same as that of MEMM. Also it resolves the label bias problem
faced by MEMM. [3]
Figure 4.Diagrammatic description of CRF
MEMM
BEAM
SEARCH
HANDLING
UNKNOWN ENTITIES
GAZETTEER
POS TAGGED
TEXT
FINAL OUTPUT
UNTAGGED
TEXT
CRF
FORWARD
VITERBI &
BBACKWARD
A* SEARCH
HANDLING
UNKNOWN ENTITIES
GAZETTEER
POS TAGGED
TEXT
FINAL OUTPUT
UNTAGGED
TEXT
5. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
47
2.2.4. Support Vector Machine (SVM)
This methodology was introduced by Vapnik. SVM is a supervised statistical approach. The main
objective of this approach is to find whether a specific vector belongs to a particular target class
or not. [2] In this approach, the training as well as the testing data belongs to the single dimension
vector space.
Figure 5: Diagrammatic description of SVM
During training in this approach, we generate a hyper plane that is used to categorize the members
into two classes (positive and negative classes) that exists on the opposite sides of a hyper plane.
This approach also computes the distance of every vector from the hyper plane known as margin.
The main advantage of this approach is that it gives high accuracy for the text categorization
problem. [4]
2.2.5 Decision Tree
It is a well known methodology that is used to extract and categorize Named Entities in a given
corpus .In this approach, some recognition rules are applied to the untagged training corpus so
that Named Entities are retrieved. Now, we match these Named Entities obtained with the actual
answer key provided by the humans. If the Named Entity is same as the answer key, then it is
referred to as the positive example else it is known as negative example. [7]. A decision tree is
build that classifies the Named Entities in the testing document.[9] The leaf node of decision tree
depicts the resultant value of test .
3. PERFORMANCE METRICS
Performance Metrics is very important since it reveals the performance of a Named Entity
Recognition based system in terms of Precision, Accuracy and F-Measure. The output of a NER
system may be termed as “response” and the interpretation of human as the “answer key”. We
consider the following terms:
1. Correct-If the response is same as the answer key.
2. Incorrect-If the response is not same as the answer key.
3. Missing-If answer key is found to be tagged but response is not tagged.
4. Spurious-If response is found to be tagged but answer key is not tagged.[6]
Hence, we define Precision, Recall and F-Measure as follows:
SVM
BEAM
SEARCH
HANDLING
UNKNOWN ENTITIES
GAZETTEER
POS TAGGED
TEXT
FINAL OUTPUT
UNTAGGED
TEXT
6. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
48
Precision (P): Correct / (Correct + Incorrect + Missing)
Recall (R): Correct/ (Correct + Incorrect + Spurious)
F-Measure: (2*P*R)/(P+R) [5][8]
4. ISSUES IN NER IN INDIAN LANGUAGES
We still have not performed much of the work in NER in the Indian languages. This is mainly due
to the fact that Indian languages lack in resources such as annotated corpora and lexical resources.
There are many challenges related to the Named Entity Recognition in the Indian languages
.Some of them include the following:[6]
1. Lack of Capitalization: In Indian languages, the Capitalization concept is absent. Whereas, in
English and in many of the European languages, the word in which first alphabet is capital is a
proper noun. The NER based systems that are developed for the English and the European
languages, henceforth cannot be used to perform named entity recognition in the Indian languages
.Thus there is a need to develop an efficient NER based system for the Indian languages. [15]
2. Indian languages are inflectional and morphologically rich and are free word order.
3. Indian languages lack in resources .This problem is due to the fact that web mostly have lists of
Named Entities which are in English and not in the Indian languages.[17].
4. In dictionary of the Indian languages, many common nouns also exists as proper nouns. E.g.
Lata, Suraj, Aakash , Tara etc. are the Name of persons and common nouns as well. So, we need
to resolve ambiguities, which is also one of the issues in NER in the Indian languages
5. RESULTS
We have prepared a general corpus from the Hindi newspapers on the web. We have annotated it
manually. The Named Entity tags that we have used are: PER (Name of Person), LOC (Name of
Location), TIME, MONTH, SPORT, ORG (Name of Organization), VEH (Name of Vehicle),
RIVER and QTY (Quantity).In the first phase, we have applied the Rule based heuristics or the
shallow parsing technique over the Corpus, in which some of the helping words are used to detect
the Named Entities, that occur just after or before the Named Entities to be identified. In the
second phase, we apply Hidden Markov Model (HMM) to detect the rest of the Named Entities.
7. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
49
Table 2 Results of Rule based heuristics or shallow parsing technique
NAMED
ENTITIES
TOTAL NAMED
ENTITIES(NEs)
NAMED ENTITIES
(NEs) IDENTIFIED
ACCURACY
LOC 247 125 50.60%
PER 56 29 51.79%
QTY 79 40 50.63%
TIME 67 34 50.75%
ORG 135 68 50.37%
SPORT 45 23 51.11%
RIVER 11 6 54.54%
VEH 25 0 0%
MONTH 22 0 0%
TOTAL NEs = 687 TOTAL NEs DETECTED
= 325
TOTAL ACCURACY
= 47.5%
Table 3 Results of Hidden Markov Model (HMM)
NAMED
ENTITIES
TOTAL NAMED
ENTITIES (NEs)
UNDETECTED
NAMED ENTITIES (NEs)
IDENTIFIED
ACCURACY
LOC 122 107 87.70%
PER 27 24 88.89%
QTY 39 34 87.18%
TIME 33 29 87.88%
ORG 67 59 88.06%
SPORT 22 20 90.90%
RIVER 5 5 100%
VEH 25 25 100%
MONTH 22 22 100%
TOTAL NEs = 362 TOTAL NEs DETECTED
= 325
TOTAL ACCURACY
= 89.78%
Table 4 Results of Combination of Approaches or Hybrid Approach
TOTAL NAMED
ENTITIES (NEs)
NAMED ENTITIES (NEs)
IDENTIFIED
ACCURACY
687 650 94.61%
8. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
Figure 6 Results of using Combined Approach
6. CONCLUSIONS
We have obtained accuracy of about 94.61% by
HMM, as shown in Table 4. Table 2 depicts that if we applied only Rule Based Heuristics, then it
performed very poorly, and the accuracy
depicts that if we applied only HMM, then it
obtained by this approach was
combined approach, then it gives very good results in
ACKNOWLEDGEMENT
I would like to thank all those who helped me in accomplishing this task.
REFERENCES
[1] Animesh Nayan,, B. Ravi Kiran Rao, Pawandeep Singh,Sudip Sanyal and Ratna Sanya “Named
Entity Recognition for Indian Languages”
South and South East Asian Languages ,Hyderabad (India) pp. 97
[2] Asif Ekbal and Sivaji Bandyopadhyay. “
Language Independent Approa
2010.
[3] Asif Ekbal, Rejwanul Haque, Amitava Das, Venkateswarlu Poka and Sivaji Bandyopadhyay
“Language Independent Named Entity Recognition in Indian Languages” .In Proceedings of
IJCNLP-08 Workshop on NER for South and South East Asian Languages, pages 33
India, January 2008.
[4] Asif Ekbal and Sivaji Bandyopadhyay 2008 “ Bengali Named Entity Recognition using Support
Vector Machine” Proceedings of the IJCNLP
Languages, pages 51–58, Hyderabad, India, January 2008..
90
91
92
93
94
95
96
97
98
99
100
101
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
Figure 6 Results of using Combined Approach
We have obtained accuracy of about 94.61% by aggregating the rule based heuristic
Table 2 depicts that if we applied only Rule Based Heuristics, then it
performed very poorly, and the accuracy obtained by this approach was 47.5%. Similarly, Table 3
we applied only HMM, then its performance was average, and the accuracy
obtained by this approach was 89.78%. This shows that if we apply hybrid approach
combined approach, then it gives very good results in a Named Entity Recognition based system
I would like to thank all those who helped me in accomplishing this task.
Animesh Nayan,, B. Ravi Kiran Rao, Pawandeep Singh,Sudip Sanyal and Ratna Sanya “Named
Entity Recognition for Indian Languages” .In Proceedings of the IJCNLP-08 Workshop on NER for
South and South East Asian Languages ,Hyderabad (India) pp. 97–104, 2008.
Bandyopadhyay. “Named Entity Recognition using Support Vector Machine: A
Language Independent Approach” International Journal of Electrical and Electronics Engineering 4:2
Asif Ekbal, Rejwanul Haque, Amitava Das, Venkateswarlu Poka and Sivaji Bandyopadhyay
“Language Independent Named Entity Recognition in Indian Languages” .In Proceedings of
08 Workshop on NER for South and South East Asian Languages, pages 33–
Asif Ekbal and Sivaji Bandyopadhyay 2008 “ Bengali Named Entity Recognition using Support
Vector Machine” Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian
58, Hyderabad, India, January 2008..
NAMED ENTITIES
LOC
PER
QTY
TIME
ORG
SPORT
RIVER
VEH
MONTH
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
50
le based heuristics and the
Table 2 depicts that if we applied only Rule Based Heuristics, then it
Similarly, Table 3
, and the accuracy
apply hybrid approach or the
based system.
Animesh Nayan,, B. Ravi Kiran Rao, Pawandeep Singh,Sudip Sanyal and Ratna Sanya “Named
08 Workshop on NER for
Named Entity Recognition using Support Vector Machine: A
ch” International Journal of Electrical and Electronics Engineering 4:2
Asif Ekbal, Rejwanul Haque, Amitava Das, Venkateswarlu Poka and Sivaji Bandyopadhyay
“Language Independent Named Entity Recognition in Indian Languages” .In Proceedings of the
–40,Hyderabad,
Asif Ekbal and Sivaji Bandyopadhyay 2008 “ Bengali Named Entity Recognition using Support
Workshop on NER for South and South East Asian
9. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
51
[5] B. Sasidhar, P. M. Yohan, Dr. A. Vinaya Babu3, Dr. A. Govardhan. “A Survey on Named Entity
Recognition in Indian Languages with particular reference to Telugu” IJCSI International Journal of
Computer Science Issues, Vol. 8, Issue 2, March 2011
[6] Darvinder kaur, Vishal Gupta. “A survey of Named Entity Recognition in English and other Indian
Languages” . IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November
2010.
[7] Georgios Paliouras, Vangelis Karkaletsis, Georgios Petasis and Constantine D.
Spyropoulos.”Learning Decision Trees for Named-Entity Recognition and Classification”
[8] G.V.S.RAJU, B.SRINIVASU, Dr.S.VISWANADHA RAJU, 4K.S.M.V.KUMAR “Named Entity
Recognition for Telugu Using Maximum Entropy Model”
[9] Hideki Isozaki “Japanese Named Entity Recognition based on a Simple Rule Generator and Decision
Tree Learning” .Available at:http://acl.ldc.upenn.edu/acl2001/MAIN/ISOZAKI.PDF
[10] James Mayfield and Paul McNamee and Christine Piatko “Named Entity Recognition using Hundreds
of Thousands of Features” .Available at: http://acl.ldc.upenn.edu/W/W03/W03-0429.pdf
[11] Kamaldeep Kaur, Vishal Gupta.” Name Entity Recognition for Punjabi Language” IRACST -
International Journal of Computer Science and Information Technology & Security (IJCSITS), ISSN:
2249-9555 .Vol. 2, No.3, June 2012
[12] Lawrence R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech
Recognition", In Proceedings of the IEEE, 77 (2), p. 257-286February 1989.Available at:
http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf
[13] “Padmaja Sharma, Utpal Sharma, Jugal Kalita.”Named Entity Recognition: A Survey for the Indian
Languages. ” . (LANGUAGE IN INDIA. Strength for Today and Bright Hope for Tomorrow .Volume
11: 5 May 2011 ISSN 1930-
2940)AvailableAt:http://www.languageinindia.com/may2011/v11i5may2011.pdf
[14] Praveen Kumar P and Ravi Kiran V” A Hybrid Named Entity Recognition System for South Asian
Languages”. Available at-http://www.aclweb.org/anthology-new/I/I08/I08-5012.pdf
[15] S. Pandian, K. A. Pavithra, and T. Geetha, “Hybrid Three-stage Named Entity Recognizer for Tamil,”
INFOS2008, March Cairo-Egypt. Available
at: http://infos2008.fci.cu.edu.eg/infos/NLP_08_P045-052.pdf
[16] Shilpi Srivastava, Mukund Sanglikar & D.C Kothari. ”Named Entity Recognition System for Hindi
Language: A Hybrid Approach” International Journal of Computational Linguistics (IJCL), Volume
(2) : Issue (1) : 2011.Available at:
http://cscjournals.org/csc/manuscript/Journals/IJCL/volume2/Issue1/IJCL-19.pdf
[17] Sujan Kumar Saha, Sudeshna Sarkar, Pabitra Mitra “Gazetteer Preparation for Named Entity
Recognition in Indian Languages”.
[18] Sujan Kumar Saha Sanjay Chatterji Sandipan Dandapat. “A Hybrid Approach for Named Entity
Recognition in Indian Languages”
[19] S. Biswas, M. K. Mishra, Sitanath_biswas, S. Acharya, S. Mohanty “A Two Stage Language
Independent Named Entity Recognition for Indian Languages” (IJCSIT) International Journal of
Computer Science and Information Technologies, Vol. 1 (4), 2010, 285-289.
[20] Vishal Gupta, Gurpreet Singh Lehal “Named Entity Recognition for Punjabi Language Text
Summarization” International Journal of Computer Applications (0975 – 8887) Vpl.33 No.3, Nov.
2011
10. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
52
Authors
Deepti Chopra received B.Tech degree in Computer Science and Engineering from
Rajasthan College of Engineering for Women, Jaipur, Rajasthan in 2011.Currently she is
pursuing her M.Tech degree in Computer Science and Engineering from Banasthali
University, Rajasthan. Her research interests include Artificial Intelligence, Natural
Language Processing, and Information Retrieval.
Nusrat Jahan received B.Tech degree in Computer Science and Engineering from R.N.
Modi Engineering College, Kota, Rajasthan in 2010.Currently she is pursuing her
M.Tech degree in Computer Science and Engineering from Banasthali University,
Rajasthan. Her research interests include Artificial Intelligence, Natural Language
Processing, and Information Retrieval.
Sudha Morwal is an active researcher in the field of Natural Language Processing.
Currently working as Associate Professor in the Department of Computer Science at
Banasthali University (Rajasthan), India. She has done M.Tech (Computer Science) ,
NET, M.Sc (Computer Science) and her PhD is in progress from Banasthali University
(Rajasthan), India.