Abstract A usage of regular expressions to search text is well known and understood as a useful technique. Regular Expressions are generic representations for a string or a collection of strings. Regular expressions (regexps) are one of the most useful tools in computer science. NLP, as an area of computer science, has greatly benefitted from regexps: they are used in phonology, morphology, text analysis, information extraction, & speech recognition. This paper helps a reader to give a general review on usage of regular expressions illustrated with examples from natural language processing. In addition, there is a discussion on different approaches of regular expression in NLP. Keywords— Regular Expression, Natural Language Processing, Tokenization, Longest common subsequence alignment, POS tagging
----------------------------
Machine translation from English to HindiRajat Jain
Machine translation a part of natural language processing.The algorithm suggested is word based algorithm.We have done Translation from English to Hindi
submitted by
Garvita Sharma,10103467,B3
Rajat Jain,10103571,B6
This document discusses natural language processing and language models. It begins by explaining that natural language processing aims to give computers the ability to process human language in order to perform tasks like dialogue systems, machine translation, and question answering. It then discusses how language models assign probabilities to strings of text to determine if they are valid sentences. Specifically, it covers n-gram models which use the previous n words to predict the next, and how smoothing techniques are used to handle uncommon words. The document provides an overview of key concepts in natural language processing and language modeling.
The document discusses natural language and natural language processing (NLP). It defines natural language as languages used for everyday communication like English, Japanese, and Swahili. NLP is concerned with enabling computers to understand and interpret natural languages. The summary explains that NLP involves morphological, syntactic, semantic, and pragmatic analysis of text to extract meaning and understand context. The goal of NLP is to allow humans to communicate with computers using their own language.
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
Part of speech (POS) tagging is the process of assigning a part of speech tag like noun, verb, adjective to each word in a sentence. It involves determining the most likely tag sequence given the probabilities of tags occurring before or after other tags, and words occurring with certain tags. POS tagging is the first step in many NLP applications and helps determine the grammatical role of words. It involves calculating bigram and lexical probabilities from annotated corpora to find the tag sequence with the highest joint probability.
General background and conceptual explanation of word embeddings (word2vec in particular). Mostly aimed at linguists, but also understandable for non-linguists.
Leiden University, 23 March 2018
Finite state automata (deterministic and nondeterministic finite automata) provide decisions regarding the acceptance and rejection of a string while transducers provide some output for a given input. Thus, the two machines are quite useful in language processing tasks.
This document discusses various techniques for text normalization and pattern matching using regular expressions. It covers tokenization, lemmatization, edit distance, and basic regular expression patterns including character classes, quantifiers, anchors, precedence, and disjunction. The key goals of text normalization are to standardize text into a convenient form for processing, while regular expressions provide a language for specifying text search patterns.
The document discusses finite state automata (FSA) and regular expressions. It provides examples of deterministic and non-deterministic FSA that recognize strings containing combinations of words. Non-deterministic FSA can have multiple possible state transitions for a given input, requiring strategies like backing up, look-ahead, or parallel processing to determine if a string is accepted. FSA and regular expressions can define regular languages by generating all matching strings.
Machine translation from English to HindiRajat Jain
Machine translation a part of natural language processing.The algorithm suggested is word based algorithm.We have done Translation from English to Hindi
submitted by
Garvita Sharma,10103467,B3
Rajat Jain,10103571,B6
This document discusses natural language processing and language models. It begins by explaining that natural language processing aims to give computers the ability to process human language in order to perform tasks like dialogue systems, machine translation, and question answering. It then discusses how language models assign probabilities to strings of text to determine if they are valid sentences. Specifically, it covers n-gram models which use the previous n words to predict the next, and how smoothing techniques are used to handle uncommon words. The document provides an overview of key concepts in natural language processing and language modeling.
The document discusses natural language and natural language processing (NLP). It defines natural language as languages used for everyday communication like English, Japanese, and Swahili. NLP is concerned with enabling computers to understand and interpret natural languages. The summary explains that NLP involves morphological, syntactic, semantic, and pragmatic analysis of text to extract meaning and understand context. The goal of NLP is to allow humans to communicate with computers using their own language.
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
Part of speech (POS) tagging is the process of assigning a part of speech tag like noun, verb, adjective to each word in a sentence. It involves determining the most likely tag sequence given the probabilities of tags occurring before or after other tags, and words occurring with certain tags. POS tagging is the first step in many NLP applications and helps determine the grammatical role of words. It involves calculating bigram and lexical probabilities from annotated corpora to find the tag sequence with the highest joint probability.
General background and conceptual explanation of word embeddings (word2vec in particular). Mostly aimed at linguists, but also understandable for non-linguists.
Leiden University, 23 March 2018
Finite state automata (deterministic and nondeterministic finite automata) provide decisions regarding the acceptance and rejection of a string while transducers provide some output for a given input. Thus, the two machines are quite useful in language processing tasks.
This document discusses various techniques for text normalization and pattern matching using regular expressions. It covers tokenization, lemmatization, edit distance, and basic regular expression patterns including character classes, quantifiers, anchors, precedence, and disjunction. The key goals of text normalization are to standardize text into a convenient form for processing, while regular expressions provide a language for specifying text search patterns.
The document discusses finite state automata (FSA) and regular expressions. It provides examples of deterministic and non-deterministic FSA that recognize strings containing combinations of words. Non-deterministic FSA can have multiple possible state transitions for a given input, requiring strategies like backing up, look-ahead, or parallel processing to determine if a string is accepted. FSA and regular expressions can define regular languages by generating all matching strings.
Natural language processing (NLP) aims to help computers understand human language. Ambiguity is a major challenge for NLP as words and sentences can have multiple meanings depending on context. There are different types of ambiguity including lexical ambiguity where a word has multiple meanings, syntactic ambiguity where sentence structure is unclear, and semantic ambiguity where meaning depends on broader context. NLP techniques like part-of-speech tagging and word sense disambiguation aim to resolve ambiguity by analyzing context.
This document discusses methods for evaluating language models, including intrinsic and extrinsic evaluation. Intrinsic evaluation involves measuring a model's performance on a test set using metrics like perplexity, which is based on how well the model predicts the test set. Extrinsic evaluation embeds the model in an application and measures the application's performance. The document also covers techniques for dealing with unknown words like replacing low-frequency words with <UNK> and estimating its probability from training data.
This document discusses natural language processing (NLP) from a developer's perspective. It provides an overview of common NLP tasks like spam detection, machine translation, question answering, and summarization. It then discusses some of the challenges in NLP like ambiguity and new forms of written language. The document goes on to explain probabilistic models and language models that are used to complete sentences and rearrange phrases based on probabilities. It also covers text processing techniques like tokenization, regular expressions, and more. Finally, it discusses spelling correction techniques using noisy channel models and confusion matrices.
The document discusses minimum edit distance and how it can be used to quantify the similarity between two strings. Minimum edit distance is defined as the minimum number of editing operations like insertion, deletion, substitution needed to transform one string into another. Levenshtein distance assigns a cost of 1 to each insertion, deletion, or substitution, and calculates the minimum edits between two strings using dynamic programming to build up solutions from sub-problems. The algorithm can also be modified to produce an alignment between the strings by storing back pointers and doing a backtrace.
The document discusses parts-of-speech (POS) tagging. It defines POS tagging as labeling each word in a sentence with its appropriate part of speech. It provides an example tagged sentence and discusses the challenges of POS tagging, including ambiguity and open/closed word classes. It also discusses common tag sets and stochastic POS tagging using hidden Markov models.
Parts-of-speech can be divided into closed classes and open classes. Closed classes have a fixed set of members like prepositions, while open classes like nouns and verbs are continually changing with new words being created. Parts-of-speech tagging is the process of assigning a part-of-speech tag to each word using statistical models trained on tagged corpora. Hidden Markov Models are commonly used, where the goal is to find the most probable tag sequence given an input word sequence.
Introduction to the theory of computationprasadmvreddy
This document provides an introduction and overview of topics in the theory of computation including automata, computability, and complexity. It discusses the following key points in 3 sentences:
Automata theory, computability theory, and complexity theory examine the fundamental capabilities and limitations of computers. Different models of computation are introduced including finite automata, context-free grammars, and Turing machines. The document then provides definitions and examples of regular languages and context-free grammars, the basics of finite automata and regular expressions, properties of regular languages, and limitations of finite state machines.
This slides covers introduction about machine translation, some technique using in MT such as example based MT and statistical MT, main challenge facing us in machine translation, and some examples of application using in MT
The document discusses N-gram language models, which assign probabilities to sequences of words. An N-gram is a sequence of N words, such as a bigram (two words) or trigram (three words). The N-gram model approximates the probability of a word given its history as the probability given the previous N-1 words. This is called the Markov assumption. Maximum likelihood estimation is used to estimate N-gram probabilities from word counts in a corpus.
The document discusses English morphology, including singular and plural forms, morphological parsing and stemming, and the different types of morphology in English such as inflectional, derivational, compounding, and cliticization morphology. It provides examples to illustrate singular and plural forms, morphological rules for changing word endings, and how prefixes, suffixes, and other morphemes can be added to word stems to change word classes or derive new words.
Recursive transition networks (RTNs) are used to define languages by representing them as graphs with nodes and labeled edges. Strings in the language are produced by paths from a start node to a final node, where the labels of the edges along the path combine to form the string. RTNs allow for defining infinite languages more efficiently than listing all strings, as adding edges increases the number of possible paths and strings. The power of RTNs increases dramatically when cycles are added to the graph, allowing for infinitely many possible paths and strings.
Introduction to natural language processingMinh Pham
This document provides an introduction to natural language processing (NLP). It discusses what NLP is, why NLP is a difficult problem, the history of NLP, fundamental NLP tasks like word segmentation, part-of-speech tagging, syntactic analysis and semantic analysis, and applications of NLP like information retrieval, question answering, text summarization and machine translation. The document aims to give readers an overview of the key concepts and challenges in the field of natural language processing.
Natural language processing involves parsing text using a lexicon, categorization of parts of speech, and grammar rules. The parsing process involves determining the syntactic tree and label bracketing that represents the grammatical structure of sentences. Evaluation measures for parsing include precision, recall, and F1-score. Ambiguities from multiple word senses, anaphora, indexicality, metonymy, and metaphor make parsing challenging.
This is the presentation on Syntactic Analysis in NLP.It includes topics like Introduction to parsing, Basic parsing strategies, Top-down parsing, Bottom-up
parsing, Dynamic programming – CYK parser, Issues in basic parsing methods, Earley algorithm, Parsing
using Probabilistic Context Free Grammars.
The document outlines the 5 phases of natural language processing (NLP):
1. Morphological analysis breaks text into paragraphs, sentences, words and assigns parts of speech.
2. Syntactic analysis checks grammar and parses sentences.
3. Semantic analysis focuses on literal word and phrase meanings.
4. Discourse integration considers the effect of previous sentences on current ones.
5. Pragmatic analysis discovers intended effects by applying cooperative dialogue rules.
Introduction to Named Entity RecognitionTomer Lieber
Named Entity Recognition (NER) is a common task in Natural Language Processing that aims to find and classify named entities in text, such as person names, organizations, and locations, into predefined categories. NER can be used for applications like machine translation, information retrieval, and question answering. Traditional approaches to NER involve feature extraction and training statistical or machine learning models on features, while current state-of-the-art methods use deep learning models like LSTMs combined with word embeddings. NER performance is typically evaluated using the F1 score, which balances precision and recall of named entity detection.
The document discusses dependency parsing in natural language processing. It begins by defining dependency as a syntactic or semantic relation between tokens. It then contrasts constituent structure, which groups tokens into phrases bottom-up, with dependency structure, which builds a graph connecting tokens with edges. The document goes on to describe the components of a dependency graph, including vertices, arcs, and relations. It also discusses projectivity, head rules to convert constituent trees to dependency trees, and different approaches to dependency parsing like transition-based and graph-based parsing.
Natural Language Processing (NLP) - IntroductionAritra Mukherjee
This presentation provides a beginner-friendly introduction towards Natural Language Processing in a way that arouses interest in the field. I have made the effort to include as many easy to understand examples as possible.
Statistical analysis to identify the main parameters to effecting wwqi of sew...eSAT Journals
Abstract The present study was conducted to determine the wastewater quality index and to study statistical interrelationships amongst different parameters. The equation was developed to predict BOD and WWQI. A number of water quality physicochemical parameters were estimated quantitatively in wastewater samples following methods and procedures as per governing authority guidelines. Wastewater Quality Index (WWQI) is regarded as one of the most effective way to communicate wastewater quality in a collective way regarding wastewater quality parameters. The WWQI of wastewater samples was calculated with fuzzy MCDM methodology. The wastewater quality index for treated wastewater was evaluated considering eight parameters subscribed by Gujarat Pollution control Board (GPCB), a governing authority for environmental monitoring in Gujarat State, India. Considerable uncertainties are involved in the process of defining the treated wastewater quality for specific usage, like irrigation, reuse, etc.
The paper presents modeling of cognitive uncertainty in the field data, while dealing with these systems recourse to fuzzy logic. Also a statistical study is done to identify the main affecting variables to the WWQI. The Statistical Regression Analysis has been found to be highly useful tool for correlating different parameters. Correlation Analysis of the data suggests that TDS, SS, BOD, COD, O&G and Cl are significantly correlated with WWQI and DO of wastewater. The estimated BOD from independent variance DO for maximum, minimum and average is 25.35 mg/L, 2.65 mg/L and 13.56 mg/L respectively. While estimated WWQI from independent variance DO for maximum, minimum and average is 0.6212, 0.3074 and 0.4581 respectively. Out of eight parameters, TDS-BOD, TDS-COD, TDS-Cl, SS-BOD, SS-COD, and BOD-COD are significantly correlated. Present study shows that WWQI is influenced by BOD, COD, SS and TDS.
A combined approach using triple des and blowfish research areaeSAT Journals
Abstract Payment card fraud is causing billions of dollars in losses for the card payment industry. Besides direct losses, the brand name can be affected by loss of consumer confidence due to the fraud. As a result of these growing losses, financial institutions and card issuers are continually seeking new techniques and innovation in payment card fraud detection and prevention. Credit card fraud falls broadly into two categories: behavioral fraud and application fraud. Credit card transactions continue to grow in number, taking an ever-larger share of the US payment system and leading to a higher rate of stolen account numbers and subsequent losses by banks. Improved fraud detection thus has become essential to maintain the viability of the US payment system. Increasingly, the card not present scenario, such as shopping on the internet poses a greater threat as the merchant (the web site) is no longer protected with advantages of physical verification such as signature check, photo identification, etc. In fact, it is almost impossible to perform any of the ‘physical world’ checks necessary to detect who is at the other end of the transaction. This makes the internet extremely attractive to fraud perpetrators. According to a recent survey, the rate at which internet fraud occurs is 20 to25 times higher than ‘physical world’ fraud. However, recent technical developments are showing some promise to check fraud in the card not present scenario. This paper provides an overview of payment card fraud and begins with payment card statistics and the definition of payment card fraud. It also describes various methods used by identity thieves to obtain personal and financial information for the purpose of payment card fraud. In addition, relationship between payment card fraud detection is provided. Finally, some solutions for detecting payment card fraud are also given. Index Terms: Online Frauds, Fraudsters, card fraud, CNP, CVV, AVS
Natural language processing (NLP) aims to help computers understand human language. Ambiguity is a major challenge for NLP as words and sentences can have multiple meanings depending on context. There are different types of ambiguity including lexical ambiguity where a word has multiple meanings, syntactic ambiguity where sentence structure is unclear, and semantic ambiguity where meaning depends on broader context. NLP techniques like part-of-speech tagging and word sense disambiguation aim to resolve ambiguity by analyzing context.
This document discusses methods for evaluating language models, including intrinsic and extrinsic evaluation. Intrinsic evaluation involves measuring a model's performance on a test set using metrics like perplexity, which is based on how well the model predicts the test set. Extrinsic evaluation embeds the model in an application and measures the application's performance. The document also covers techniques for dealing with unknown words like replacing low-frequency words with <UNK> and estimating its probability from training data.
This document discusses natural language processing (NLP) from a developer's perspective. It provides an overview of common NLP tasks like spam detection, machine translation, question answering, and summarization. It then discusses some of the challenges in NLP like ambiguity and new forms of written language. The document goes on to explain probabilistic models and language models that are used to complete sentences and rearrange phrases based on probabilities. It also covers text processing techniques like tokenization, regular expressions, and more. Finally, it discusses spelling correction techniques using noisy channel models and confusion matrices.
The document discusses minimum edit distance and how it can be used to quantify the similarity between two strings. Minimum edit distance is defined as the minimum number of editing operations like insertion, deletion, substitution needed to transform one string into another. Levenshtein distance assigns a cost of 1 to each insertion, deletion, or substitution, and calculates the minimum edits between two strings using dynamic programming to build up solutions from sub-problems. The algorithm can also be modified to produce an alignment between the strings by storing back pointers and doing a backtrace.
The document discusses parts-of-speech (POS) tagging. It defines POS tagging as labeling each word in a sentence with its appropriate part of speech. It provides an example tagged sentence and discusses the challenges of POS tagging, including ambiguity and open/closed word classes. It also discusses common tag sets and stochastic POS tagging using hidden Markov models.
Parts-of-speech can be divided into closed classes and open classes. Closed classes have a fixed set of members like prepositions, while open classes like nouns and verbs are continually changing with new words being created. Parts-of-speech tagging is the process of assigning a part-of-speech tag to each word using statistical models trained on tagged corpora. Hidden Markov Models are commonly used, where the goal is to find the most probable tag sequence given an input word sequence.
Introduction to the theory of computationprasadmvreddy
This document provides an introduction and overview of topics in the theory of computation including automata, computability, and complexity. It discusses the following key points in 3 sentences:
Automata theory, computability theory, and complexity theory examine the fundamental capabilities and limitations of computers. Different models of computation are introduced including finite automata, context-free grammars, and Turing machines. The document then provides definitions and examples of regular languages and context-free grammars, the basics of finite automata and regular expressions, properties of regular languages, and limitations of finite state machines.
This slides covers introduction about machine translation, some technique using in MT such as example based MT and statistical MT, main challenge facing us in machine translation, and some examples of application using in MT
The document discusses N-gram language models, which assign probabilities to sequences of words. An N-gram is a sequence of N words, such as a bigram (two words) or trigram (three words). The N-gram model approximates the probability of a word given its history as the probability given the previous N-1 words. This is called the Markov assumption. Maximum likelihood estimation is used to estimate N-gram probabilities from word counts in a corpus.
The document discusses English morphology, including singular and plural forms, morphological parsing and stemming, and the different types of morphology in English such as inflectional, derivational, compounding, and cliticization morphology. It provides examples to illustrate singular and plural forms, morphological rules for changing word endings, and how prefixes, suffixes, and other morphemes can be added to word stems to change word classes or derive new words.
Recursive transition networks (RTNs) are used to define languages by representing them as graphs with nodes and labeled edges. Strings in the language are produced by paths from a start node to a final node, where the labels of the edges along the path combine to form the string. RTNs allow for defining infinite languages more efficiently than listing all strings, as adding edges increases the number of possible paths and strings. The power of RTNs increases dramatically when cycles are added to the graph, allowing for infinitely many possible paths and strings.
Introduction to natural language processingMinh Pham
This document provides an introduction to natural language processing (NLP). It discusses what NLP is, why NLP is a difficult problem, the history of NLP, fundamental NLP tasks like word segmentation, part-of-speech tagging, syntactic analysis and semantic analysis, and applications of NLP like information retrieval, question answering, text summarization and machine translation. The document aims to give readers an overview of the key concepts and challenges in the field of natural language processing.
Natural language processing involves parsing text using a lexicon, categorization of parts of speech, and grammar rules. The parsing process involves determining the syntactic tree and label bracketing that represents the grammatical structure of sentences. Evaluation measures for parsing include precision, recall, and F1-score. Ambiguities from multiple word senses, anaphora, indexicality, metonymy, and metaphor make parsing challenging.
This is the presentation on Syntactic Analysis in NLP.It includes topics like Introduction to parsing, Basic parsing strategies, Top-down parsing, Bottom-up
parsing, Dynamic programming – CYK parser, Issues in basic parsing methods, Earley algorithm, Parsing
using Probabilistic Context Free Grammars.
The document outlines the 5 phases of natural language processing (NLP):
1. Morphological analysis breaks text into paragraphs, sentences, words and assigns parts of speech.
2. Syntactic analysis checks grammar and parses sentences.
3. Semantic analysis focuses on literal word and phrase meanings.
4. Discourse integration considers the effect of previous sentences on current ones.
5. Pragmatic analysis discovers intended effects by applying cooperative dialogue rules.
Introduction to Named Entity RecognitionTomer Lieber
Named Entity Recognition (NER) is a common task in Natural Language Processing that aims to find and classify named entities in text, such as person names, organizations, and locations, into predefined categories. NER can be used for applications like machine translation, information retrieval, and question answering. Traditional approaches to NER involve feature extraction and training statistical or machine learning models on features, while current state-of-the-art methods use deep learning models like LSTMs combined with word embeddings. NER performance is typically evaluated using the F1 score, which balances precision and recall of named entity detection.
The document discusses dependency parsing in natural language processing. It begins by defining dependency as a syntactic or semantic relation between tokens. It then contrasts constituent structure, which groups tokens into phrases bottom-up, with dependency structure, which builds a graph connecting tokens with edges. The document goes on to describe the components of a dependency graph, including vertices, arcs, and relations. It also discusses projectivity, head rules to convert constituent trees to dependency trees, and different approaches to dependency parsing like transition-based and graph-based parsing.
Natural Language Processing (NLP) - IntroductionAritra Mukherjee
This presentation provides a beginner-friendly introduction towards Natural Language Processing in a way that arouses interest in the field. I have made the effort to include as many easy to understand examples as possible.
Statistical analysis to identify the main parameters to effecting wwqi of sew...eSAT Journals
Abstract The present study was conducted to determine the wastewater quality index and to study statistical interrelationships amongst different parameters. The equation was developed to predict BOD and WWQI. A number of water quality physicochemical parameters were estimated quantitatively in wastewater samples following methods and procedures as per governing authority guidelines. Wastewater Quality Index (WWQI) is regarded as one of the most effective way to communicate wastewater quality in a collective way regarding wastewater quality parameters. The WWQI of wastewater samples was calculated with fuzzy MCDM methodology. The wastewater quality index for treated wastewater was evaluated considering eight parameters subscribed by Gujarat Pollution control Board (GPCB), a governing authority for environmental monitoring in Gujarat State, India. Considerable uncertainties are involved in the process of defining the treated wastewater quality for specific usage, like irrigation, reuse, etc.
The paper presents modeling of cognitive uncertainty in the field data, while dealing with these systems recourse to fuzzy logic. Also a statistical study is done to identify the main affecting variables to the WWQI. The Statistical Regression Analysis has been found to be highly useful tool for correlating different parameters. Correlation Analysis of the data suggests that TDS, SS, BOD, COD, O&G and Cl are significantly correlated with WWQI and DO of wastewater. The estimated BOD from independent variance DO for maximum, minimum and average is 25.35 mg/L, 2.65 mg/L and 13.56 mg/L respectively. While estimated WWQI from independent variance DO for maximum, minimum and average is 0.6212, 0.3074 and 0.4581 respectively. Out of eight parameters, TDS-BOD, TDS-COD, TDS-Cl, SS-BOD, SS-COD, and BOD-COD are significantly correlated. Present study shows that WWQI is influenced by BOD, COD, SS and TDS.
A combined approach using triple des and blowfish research areaeSAT Journals
Abstract Payment card fraud is causing billions of dollars in losses for the card payment industry. Besides direct losses, the brand name can be affected by loss of consumer confidence due to the fraud. As a result of these growing losses, financial institutions and card issuers are continually seeking new techniques and innovation in payment card fraud detection and prevention. Credit card fraud falls broadly into two categories: behavioral fraud and application fraud. Credit card transactions continue to grow in number, taking an ever-larger share of the US payment system and leading to a higher rate of stolen account numbers and subsequent losses by banks. Improved fraud detection thus has become essential to maintain the viability of the US payment system. Increasingly, the card not present scenario, such as shopping on the internet poses a greater threat as the merchant (the web site) is no longer protected with advantages of physical verification such as signature check, photo identification, etc. In fact, it is almost impossible to perform any of the ‘physical world’ checks necessary to detect who is at the other end of the transaction. This makes the internet extremely attractive to fraud perpetrators. According to a recent survey, the rate at which internet fraud occurs is 20 to25 times higher than ‘physical world’ fraud. However, recent technical developments are showing some promise to check fraud in the card not present scenario. This paper provides an overview of payment card fraud and begins with payment card statistics and the definition of payment card fraud. It also describes various methods used by identity thieves to obtain personal and financial information for the purpose of payment card fraud. In addition, relationship between payment card fraud detection is provided. Finally, some solutions for detecting payment card fraud are also given. Index Terms: Online Frauds, Fraudsters, card fraud, CNP, CVV, AVS
A clonal based algorithm for the reconstruction of genetic network using s sy...eSAT Journals
Abstract Motivation: Gene regulatory network is the network based approach to represent the interactions between genes. DNA microarray is the most widely used technology for extracting the relationships between thousands of genes simultaneously. Gene microarray experiment provides the gene expression data for a particular condition and varying time periods. The expression of a particular gene depends upon the biological conditions and other genes. In this paper, we propose a new method for the analysis of microarray data. The proposed method makes use of S-system, which is a well-accepted model for the gene regulatory network reconstruction. Since the problem has multiple solutions, we have to identify an optimized solution. Evolutionary algorithms have been used to solve such problems. Though there are a number of attempts already been carried out by various researchers, the solutions are still not that satisfactory with respect to the time taken and the degree of accuracy achieved. Therefore, there is a need of huge amount further work in this topic for achieving solutions with improved performances. Results: In this work, we have proposed Clonal selection algorithm for identifying optimal gene regulatory network. The approach is tested on the real life data: SOS Ecoli DNA repairing gene expression data. It is observed that the proposed algorithm converges much faster and provides better results than the existing algorithms. Index Terms: Microarray analysis, Evolutionary Algorithm, Artificial Immune System, S-system, Gene Regulatory Network, SOS Ecoli DNA repairing, Clonal Selection Algorithm.
Reliability assessment for a four cylinder diesel engine by employing algebra...eSAT Journals
Abstract
In this paper, the authors have evaluated reliability of four cylinder diesel engine by employing the Algebra of logics which is easier
in comparison of old techniques. Here, a multi-component fuel system in diesel engine, comprised of four subsystems in series, has
considered. The authors, in this model, have considered a parallel redundant fuel injective device to improve the system’s
performance. The whole system can fail due to failure of atleast one component of all the routes of flow.
Boolean function technique has used to formulate and solve the mathematical model. Reliability and M.T.T.F. of the considered diesel
engine have been obtained to connect the model with physical situations. A numerical example and its graphical representation have
been appended at last to highlight important results.
Keywords: Boolean function, Algebra of logics, Parallel redundancy, Four cylinder diesel engine, Weibull time
distribution, Exponential time distribution, Reliability, MTTF etc.
Assessment of composting, energy and gas generation potential for msw at alla...eSAT Journals
The document analyzes the potential for composting, energy generation, and gas generation from municipal solid waste (MSW) in Allahabad City, India. Key findings include:
- The C/N ratio of MSW was found to be less than 30:1, indicating the waste is not suitable for composting.
- The energy content of MSW was estimated to be between 2495-2972 kcal/kg, below the minimum recommended value for incineration.
- Modeling showed a bioreactor landfill with leachate recirculation would generate more methane gas than a controlled sanitary landfill, making it the best disposal method for Allahabad's MSW
Numerical parametric study on interval shift variation in simo sstd technique...eSAT Journals
This document presents a parametric study on the time shift interval variation in the SIMO-SSTD technique for experimental modal analysis. The SSTD (Single Station Time Domain) technique extracts modal parameters from free decay responses without using Fourier transforms. The study investigates the accuracy of natural frequency and damping ratio results from the SSTD algorithm when using different time shift intervals between data matrices. Simulated data with known modal properties is used to calculate percentage errors for different shift intervals and noise levels. The goal is to determine the effect of time shift interval on the accuracy of the SSTD technique.
Design and manufacturing of drive center mandreleSAT Journals
Abstract In The Manufacturing Of Sleeve Yoke 1650 Many Problems Were Faced During Assembly And Disassembly Of Sleeve Yoke And Mandrel. The Assembly Being Too Heavy Requires Two Operators. Also, There Is Potential Danger Of Assembly Falling Down Leading To Damage To Life And Property. Moreover, The Idle Time Per Unit Production Is More Than Expected. Therefore The Aim Of Our Project Is To Design And Manufacture A New Mandrel Which Will Solve All The Problems And Increasing The Productivity. Splined Live Centre Technique Is Used To Design The New Drive Centre Along With The Mandrel Block . Keywords—Sleeve Yoke And Mandrel Assembly, Heavy, More Ideal Time, Splined Live Centre Technique, Mandrel Design, Mandrel Manufacturing.
Assessment of electromagnetic radiations from communication transmission towe...eSAT Journals
Abstract The effects of exposure from electromagnetic radiations of wireless cellular transmission towers to human health have attracted the attention of many researchers. Different works have revealed the harmful of electromagnetic radiation exposure to human health based on distance from the source and period of exposure. As one stays closer and at a pro-longed period from the transmission sites, the possibility of being affected by the radiation source becomes higher. In this work, we review some of the works on assessment of electromagnetic radiation exposure and propose measures for determining safety zones based on the cases of cellular transmission towers in the Tanzania environment to avoid extended exposure to electromagnetic radiation. Key words- Cellular transmission towers; Electromagnetic radiations; Health effects; Exposure limits
Performance evaluation of tcp sack1 in wimax network asymmetryeSAT Journals
Abstract The WiMAX technology support to different channel bandwidth, cyclic prefix, modulation coding scheme, frame duration, simultaneous two way data transfer and propagation model. The WiMAX network asymmetry is largely depends on DL: UL ratio. This paper evaluate the performance of TCP Sack1 by considering Channel Bandwidth, Cyclic Prefix, Modulation Coding Scheme, Frame Duration, Two way transfer and Propagation model in WiMAX network with network asymmetry. The performance of TCP Sack1 in WiMAX network is evaluating by varying MAC layer parameter such as channel bandwidth, cyclic prefix, modulation coding scheme, frame duration, DL: UL ratio and physical layer parameter such as propagation model and full duplex mode of data transfer and other operating parameter such as downloading traffic and these parameters really affect the performance of TCP Sack1 in WiMAX network. The performance of WiMAX network is measured in terms of throughput, goodput and number of packets dropped. Keywords: World Wide interoperability for microwave access (WiMAX), Subscriber Stations (SSs), Downlink (DL), Uplink (UL), Medium access control (MAC), Transmission Control Protocol (TCP), OFDM, IEEE 802.16, Throughput, Goodput and Packets drop
An experimental study on mud concrete using soil as a fine aggrgate and ld sl...eSAT Journals
Abstract Aggregates are important ingredients of concrete. Sand is used abundantly after air and water. The extensive use of these natural resources is exploiting the environment every day. many alternative materials are being used, viz., slag sand, manufactured sand, quarry dust etc., as fine aggregates; Materials such as steel slag, blast furnace slag are being used as replacement for coarse aggregates. This paper reports the result of different mixes obtained by partial replacement of Natural coarse aggregates (NCA) and complete replacement of fine aggregates (FA) by alternative material such as LD slag and Natural soil respectively. This paper reports the result of different mixes obtained by partial replacement of natural coarse aggregates (CA) and complete replacement of fine aggregates (FA) by alternative material such as LD slag and Natural soil respectively. The wet compressive strength ranged from 16MPa to 20MPa for cubes made of Natural Sand and Natural Coarse Aggregates MIX-D. The wet compressive strength ranged from 18-26MPa for MIX-A; The value obtained for MIX-A was found to be 20% more compared to MIX-D. The split tensile strength ranged from 1.16-1.51MPa for MIX-A, it was concluded that, the mud concrete mix prepared with soil and LD slag gave the satisfactory result which was intended to achieve by normal conventional concrete mix MIX-D. The flexural strength ranged from 3.04-3.41MPa for MIX-A and 2.84-3.45MPa for M4, , it was concluded that, the mud concrete mix prepared with soil and LD slag gave the satisfactory result which was intended to achieve by normal conventional concrete mix. The mud concrete with Soil and LD slag cut down the cost of mix up to 43% when compared with normal conventional concrete of equivalent grade. Keywords: MUD Concrete, LD Slag, NCA, Alternative Materials, Wet Compressive Strength.
Survey on securing outsourced storages in cloudeSAT Journals
Abstract Cloud computing is one of the buzzwords of technological developments in the IT industry and service sectors. Widening the social capabilities of servicing for a user on the internet while narrowing the insufficiency to store information and provide facilities locally, computing interests are shifting towards cloud services. Cloud services although contributes to major advantages for servicing also incurs notification to major security issues. The issues and the approaches that can be taken to minimise or even eliminate their effects are discussed in this paper to progress toward more secure storage services on the cloud. Keywords: Cloud computing, Cloud Security, Outsourced Storages, Storage as a Service
Security threats and detection technique in cognitive radio network with sens...eSAT Journals
This document discusses security threats and detection techniques in cognitive radio networks. It begins by providing background on cognitive radio networks and how secondary users can utilize unused spectrum bands of primary users. However, this can introduce security issues if malicious users emulate primary users. The document then discusses two main threats: primary user emulation attacks, where fake secondary users pretend to be primary users and disrupt communications, and jamming attacks. It proposes using energy spectrum sensing techniques for secondary users to detect unused spectrum bands while avoiding interfering with primary users. The document concludes that cognitive radio networks introduce practical wireless communication challenges, and future work is needed to improve security against identified threats.
Software as a service for efficient cloud computingeSAT Journals
Abstract This Research paper explores importance of Software As A Service (SaaS) for efficient cloud computing in organizations and its implications. Enterprises now a days are betting big on SaaS and integrating this service delivery model of cloud computing architecture in their IT services. SaaS applications are service centric cloud computing delivery model used as IT Infrastructure which is multi-tenant architecture used to provide rich user experience with desired set of features requested by the cloud user. This research paper also discusses the importance of SaaS application architecture, functionality, efficiency, advantages and disadvantages. Keywords: Cloud Computing, Service Delivery Models, Software as a Service, SaaS Architecture.
Comparative analysis of dynamic programming algorithms to find similarity in ...eSAT Journals
Abstract There exist many computational methods for finding similarity in gene sequence, finding suitable methods that gives optimal similarity is difficult task. Objective of this project is to find an appropriate method to compute similarity in gene/protein sequence, both within the families and across the families. Many dynamic programming algorithms like Levenshtein edit distance; Longest Common Subsequence and Smith-waterman have used dynamic programming approach to find similarities between two sequences. But none of the method mentioned above have used real benchmark data sets. They have only used dynamic programming algorithms for synthetic data. We proposed a new method to compute similarity. The performance of the proposed algorithm is evaluated using number of data sets from various families, and similarity value is calculated both within the family and across the families. A comparative analysis and time complexity of the proposed method reveal that Smith-waterman approach is appropriate method when gene/protein sequence belongs to same family and Longest Common Subsequence is best suited when sequence belong to two different families. Keywords - Bioinformatics, Gene, Gene Sequencing, Edit distance, String Similarity.
Design and analysis of bodyworks of a formula style racecareSAT Journals
Abstract: The aim was to develop a body of the racecar with the proper studies and analyses, taking into account several factors, to present an optimum structure as a final result. These factors include, but are not limited to, weight, cost, drag resistance, functionality and aesthetics. The expected product is to not just be appealing to the eye but also increase the performance of the vehicle. Additional objectives include being able to accommodate the budget while maintaining a highly competitive level to perform well in on the race track. The new design will reduce the weight of the prototype and as well as the air drag, taking into consideration the ground effects desired to be implemented in the vehicle as a crucial factor. Moreover, the new body will be easier to dismantle reducing the service time. Keywords: drag resistance, aesthetics, performance, weight, cost.
History of gasoline direct compression ignition (gdci) engine a revieweSAT Journals
Abstract The first single-cylinder gasoline direct compression ignition (GDCI) engine was designed and built in 2010 by Delphi Companyfor testing performance, emissions and Brake specific fuel consumption (BSFC). Then after achieving the good results in performance, emissions and BSFCfrom single-cylinder engine, multi-cylinder GDCI engine was built in 2013. The compression ignition engine has limitations such as high noise, weight, PM and NOX emissions compared to gasoline engine. But the high efficiency, torque and better fuel economy of compression ignition engine are the reasons of Delphi Company to use compression ignition strategy for building a new combustion system. The objective of the present review study involves the reasons of building of the GDCI engine in detail. Keywords: Delphi Company,Emissions, Multi-Cylinder GDCI engine andSingle-CylinderGDCI Engine.
The automatic license plate recognition(alpr)eSAT Journals
Abstract Every country uses their own way of designing and allocating number plates to their country vehicles. This license number plate is then used by various government offices for their respective regular administrative task like- traffic police tracking the people who are violating the traffic rules, to identify the theft cars, in toll collection and parking allocation management etc. In India all motorized vehicle are assigned unique numbers. These numbers are assigned to the vehicles by district-level Regional Transport Office (RTO). In India the license plates must be kept in both front and back of the vehicle. These plates in general are easily readable by human due to their high level of intelligence on the contrary; it becomes an extremely difficult task for the computers to do the same. Many attributes like illumination, blur, background color, foreground color etc. will pose a problem. Index Terms: Automatic license plate recognition (ALPR) system, proposed methodology, reference
Eatrhquake response of reinforced cocrete multi storey building with base iso...eSAT Journals
This document summarizes a study on the earthquake response of base isolated reinforced concrete buildings. It describes the basic concept of base isolation, which aims to protect structures from seismic forces by isolating the building from ground movement using devices called isolators. It then discusses different types of isolators and presents the results of analyzing a 3D 8-story building model using different isolators, finding that base isolation substantially reduces base shear, displacements, and member forces compared to a fixed-base building.
Discrete wavelet transform based analysis of transformer differential currenteSAT Journals
This document analyzes transformer differential currents using discrete wavelet transform (DWT) during various operating conditions. It simulates a 2 kVA transformer system in MATLAB and analyzes the DWT of differential currents during magnetizing inrush, internal faults, switching, and over-fluxing. Statistical features are extracted from the DWT coefficients, which could be used as inputs for a classifier to distinguish between fault and non-fault conditions for improved differential protection. The analysis found that no single statistical feature was sufficient and that using multiple features from different decomposition levels may help classification.
This document discusses various topics related to programming languages including:
- It defines different types of programming languages like procedural, functional, scripting, logic, and object-oriented.
- It explains why studying programming languages is important for expressing ideas, choosing appropriate languages, learning new languages, and better understanding implementation.
- It describes programming paradigms like imperative, object-oriented, functional, and logic programming.
- It covers concepts like syntax, semantics, pragmatics, grammars, and translation models in programming languages.
Statistically-Enhanced New Word IdentificationAndi Wu
This document discusses a method for identifying new words in Chinese text using a combination of rule-based and statistical approaches. Candidate character strings are selected as potential new words based on their independent word probability being below a threshold. Parts of speech are then assigned to candidate strings by examining the part of speech patterns of their component characters and comparing them to existing words in a dictionary to determine the most likely part of speech based on word formation patterns in Chinese. This hybrid approach avoids the overgeneration of rule-based systems and data sparsity issues of purely statistical approaches.
Author Credits - Maaz Anwar Nomani
Semantic Role Labeler (SRL) is a semantic parser which can automatically identify and then classify arguments of a verb in a natural language sentence for Hindi and Urdu. For e.g. in the natural language sentence “Sara won the competition because of her hard work.”, ‘won’ is the main verb and there are 3 arguments for this verb; ‘Sara’ (Agent), ‘hard work’ (Reason) and ‘competition’ (Theme). The problem statement of a SRL revolves around the fact that how will you make a machine identify and then classify the arguments of a verb in a natural language sentence.
Since there are 2 sub problem statements here (Identification and Classification), our SRL has a pipeline architecture in which a binary classifier (Logistic Regression) is first trained to identify whether a word is an argument to a verb in a sentence or not (Yes or No) and subsequently a multi-class classifier (SVM with Linear kernel) is trained to classify the identified arguments by above binary classifier into one of the 20 classes. These 20 classes are the various notions present in a natural language sentence (for e.g. Agent, Theme, Location, Time, Purpose, Reason, Cause etc.). These ‘notions’ are called Propbank labels or semantic labels present in a Proposition Bank which is a collection of hand-annotated sentences.
In essence, SRL felicitates Semantic Parsing which essentially is the research investigation of identifying WHO did WHAT to WHOM, WHERE, HOW, WHY and WHEN etc. in a natural language sentence.
Phonetic Recognition In Words For Persian Text To Speech Systemspaperpublications3
Abstract:The interest in text to speech synthesis increased in the world .text to speech have been developed for many popular languages such as English, Spanish and French and many researches and developments have been applied to those languages. Persian on the other hand, has been given little attention compared to other languages of similar importance and the research in Persian is still in its infancy. Persian languages possess many difficulty and exceptions that increase complexity of text to speech systems. For example: short vowels is absent in written text or existence of homograph words. in this paper we propose a new method for Persian text to phonetic that base on pronunciations by analogy in words, semantic relations and grammatical rules for finding proper phonetic.Keywords:PbA, text to speech, Persian language, Phonetic recognition.
Title:Phonetic Recognition In Words For Persian Text To Speech Systems
Author:Ahmad Musavi Nasab, Ali Joharpour
International Journal of Recent Research in Mathematics Computer Science and Information Technology (IJRRMCSIT)
Paper Publications
Implementation Of Syntax Parser For English Language Using Grammar RulesIJERA Editor
From many years we have been using Chomsky‟s generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. Syntactic parsing mainly works with syntactic structure of a sentence. The 'syntax' refers to the grammatical and syntactical arrangement of words in a sentence and their relationship with other words. The main focus of syntactic analysis is important to find syntactic structure of a sentence which usually is represented as a tree structure. To identify the syntactic structure is useful in determining the meaning of a sentence Natural language processing processes the data through lexical analysis, Syntax analysis, Semantic analysis, and Discourse processing, Pragmatic analysis. This paper gives various parsing methods. The algorithm in this paper splits the English sentences into parts using POS (Parts Of Speech) tagger, It identifies the type of sentence (Simple, Complex, Interrogate, Facts, active, passive etc.) and then parses these sentences using grammar rules of Natural language. As natural language processing becomes an increasingly relevant, there is a need for tree banks catered to the specific needs of more individualized systems. Here, we present the open source technique to check and correct the grammar. The methodology will give appropriate grammatical suggestions.
Part of speech tagging is one of the basic steps in natural language processing. Although it has been
investigated for many languages around the world, very little has been done for Setswana language.
Setswana language is written disjunctively and some words play multiple functions in a sentence. These
features make part of speech tagging more challenging. This paper presents a finite state method for
identifying one of the compound parts of speech, the relative. Results show an 82% identification rate
which is lower than for other languages. The results also show that the model can identify the start of a
relative 97% of the time but fail to identify where it stops 13% of the time. The model fails due to the
limitations of the morphological analyser and due to more complex sentences not accounted for in the
model.
The document describes a machine learning approach for language identification, named entity recognition, and transliteration on query words. It discusses:
1) Using supervised machine learning classifiers like random forest, decision trees, and SVMs along with contextual, character n-gram, and gazetteer features for language identification of Hindi-English and Bangla-English words.
2) Applying an IOB tagging scheme and features like character n-grams, context words, and typographic properties for named entity recognition and classification.
3) A statistical machine transliteration model that segments, aligns, and maps source and target language transliteration units based on context and probabilities learned from parallel training data.
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans. In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISONIJCSEA Journal
The massive grow of the modern information retrieval system (IRS), especially in natural languages
becomes more difficult. The search in Arabic languages, as natural language, is not good enough yet. This
paper will try to build similar thesaurus based on Arabic language in two mechanisms, the first one is full
word mechanisms and the other is stemmed mechanisms, and then to compare between them.
The comparison made by this study proves that the similar thesaurus using stemmed mechanisms get more
better results than using traditional in the same mechanisms and similar thesaurus improved more the
recall and precision than traditional information retrieval system at recall and precision levels.
Paraphrasing refers to writing that either differs in its textual content or is dissimilar in rearrangement of words
but conveys the same meaning. Identifying a paraphrase is exceptionally important in various real life applications
such as Information Retrieval, Plagiarism Detection, Text Summarization and Question Answering. A large amount
of work in Paraphrase Detection has been done in English and many Indian Languages. However, there is no
existing system to identify paraphrases in Marathi. This is the first such endeavor in the Marathi Language.
A paraphrase has differently structured sentences, and since Marathi is a semantically strong language, this
system is designed for checking both statistical and semantic similarities of Marathi sentences. Statistical similarity
measure does not need any prior knowledge as it is only based on the factual data of sentences. The factual data
is calculated on the basis of the degree of closeness between the word-set, word-order, word-vector and worddistance. Universal Networking Language (UNL) speaks about the semantic significance in sentence without any
syntactic points of interest. Hence, the semantic similarity calculated on the basis of generated UNL graphs for two
Marathi sentences renders semantic equality of two Marathi sentences. The total paraphrase score was calculated
after joining statistical and semantic similarity scores, which gives a judgment on whether there is paraphrase or
non-paraphrase about the Marathi sentences in question.
Abstract
Part of speech tagging plays an important role in developing natural language processing software. Part of speech tagging means assigning part of speech tag to each word of the sentence. The part of speech tagger takes a sentence as input and it assigns respective/appropriate part of speech tag to each word of that sentence. In this article I surveys the different work have done about odia POS tagging.
________________________________________________
This document summarizes research that has been done on computational morphology for the Odia language. It begins with an abstract that outlines how morphological analysis, generation, and parsing are important tools for natural language processing. The document then reviews different works that have developed morphological analyzers and generators for Odia. It describes various methods that have been used, including suffix stripping, finite state transducers, two-level morphology, corpus-based approaches, and paradigm-based approaches. Finally, it outlines several applications of morphology like machine translation, spelling checking, and part-of-speech tagging.
Dependency Analysis of Abstract Universal Structures in Korean and EnglishJinho Choi
This thesis gives two contributions in the form of lexical resourcesto (1) dependency parsing in Korean and (2) semantic parsing in English. First,we describe our methodology for building three dependency treebanks in Korean derived from existing treebanks and pseudo-annotated according to the latest guidelines from the Universal Dependencies (UD). The original Google Korean UD Treebank is re-tokenized to ensure morpheme-level annotation consistency with other corpora while maintaining linguistic validity of the revised tokens. Phrase structure trees in the Penn Korean Treebank and the Kaist Treebank are automatically converted into UD dependency trees by applying head-percolation rules and linguistically motivated heuristics. A total of 38K+ dependency trees are generated.To the best of our knowledge, this is the first time that the three Korean treebanks are converted into UD dependency treebanks following the latest annotation guidelines. Second, we introduce an on-going project for augmenting the OntoNotes phrase structure treebank with semantic features found in the Abstract Meaning Representation (AMR), as part of an effort to build an accurate AMR parser. We propose a novel technique for AMR parsing that first trains a dependency parser on the OntoNotes corpus augmented with numbered arguments in the Proposition Bank (PropBank), and then does a transfer learning of the trained dependency parser for the AMR parsing task. A preliminary step is to prepare dependency data by performing an automatic replacement of dependencies that define predicate argument structure with their corresponding PropBank argument labels during constituent-to-dependency conversion. To the best of our knowledge, this is the first time that the PropBank labels are directly inserted into dependency structure, producing a new dependency corpus with rich syntactic information as well as semantic role information provided by PropBank that fully describes the predicate-argument structure, making it an ideal resource for AMR parsing and, broadly, semantic parsing.
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)ijnlc
Extracting key phrases from documents is a common task in many applications. In general: The Noun
Phrase Extractor consists of three modules: tokenization; part-of-speech tagging; noun phrase
identification. These will be used as three main steps in building the new system ANPE, This paper aims at
picking Arabic Noun Phrases from a corpus of documents, Relevant criteria (Recall and Precision), will be
used as evaluation measure. On the one hand, when using NPs rather than using single terms, the system
yields more relevant documents from the retrieved ones, on the other hand, it gave low precision because
number of the retrieved documents will be decreased. At the researchers conclude and recommend
improvements for more effective and efficient research in the future.
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Chunyang Chen
The paper accepted on ICSE'17 and TSE'19. https://se-thesaurus.appspot.com/ https://pypi.org/project/DomainThesaurus/ Informal discussions on social platforms (e.g., Stack Overflow) accumulates a large body of programming knowledge in natural language text. Natural language process (NLP) techniques can be exploited to harvest this knowledge base for software engineering tasks. To make an effective use of NLP techniques, consistent vocabulary is essential. Unfortunately, the same concepts are often intentionally or accidentally mentioned in many different morphological forms in informal discussions, such as abbreviations, synonyms and misspellings. Existing techniques to deal with such morphological forms are either designed for general English or predominantly rely on domain-specific lexical rules. A thesaurus of software-specific terms and commonlyused morphological forms is desirable for normalizing software engineering text, but very difficult to build manually. In this work, we propose an automatic approach to build such a thesaurus. Our approach identifies software-specific terms by contrasting software-specific and general corpuses, and infers morphological forms of software-specific terms by combining distributed word semantics, domain-specific lexical rules and transformations, and graph analysis of morphological relations. We evaluate the coverage and accuracy of the resulting thesaurus against community-curated lists of software-specific terms, abbreviations and synonyms. We also manually examine the correctness of the identified abbreviations and synonyms in our thesaurus. We demonstrate the usefulness of our thesaurus in a case study of normalizing questions from Stack Overflow and CodeProject.
Regular Expressions, most often known as “RegEx” is one of the most popular and widely accepted technology used for parsing the specific data contents from large text.
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
This paper describes the use of Naive Bayes to address the task of assigning function tags and context free
grammar (CFG) to parse Myanmar sentences. Part of the challenge of statistical function tagging for
Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex
morphological system. Function tagging is a pre-processing step for parsing. In the task of function
tagging, we use the functional annotated corpus and tag Myanmar sentences with correct segmentation,
POS (part-of-speech) tagging and chunking information. We propose Myanmar grammar rules and apply
context free grammar (CFG) to find out the parse tree of function tagged Myanmar sentences. Experiments
show that our analysis achieves a good result with parsing of simple sentences and three types of complex
sentences
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
This paper describes the use of Naive Bayes to address the task of assigning function tags and context free
grammar (CFG) to parse Myanmar sentences. Part of the challenge of statistical function tagging for
Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex
morphological system. Function tagging is a pre-processing step for parsing. In the task of function tagging, we use the functional annotated corpus and tag Myanmar sentences with correct segmentation, POS (part-of-speech) tagging and chunking information. We propose Myanmar grammar rules and apply context free grammar (CFG) to find out the parse tree of function tagged Myanmar sentences. Experiments
show that our analysis achieves a good result with parsing of simple sentences and three types of complex sentences.
Similar to Usage of regular expressions in nlp (20)
Mechanical properties of hybrid fiber reinforced concrete for pavementseSAT Journals
Abstract
The effect of addition of mono fibers and hybrid fibers on the mechanical properties of concrete mixture is studied in the present
investigation. Steel fibers of 1% and polypropylene fibers 0.036% were added individually to the concrete mixture as mono fibers and
then they were added together to form a hybrid fiber reinforced concrete. Mechanical properties such as compressive, split tensile and
flexural strength were determined. The results show that hybrid fibers improve the compressive strength marginally as compared to
mono fibers. Whereas, hybridization improves split tensile strength and flexural strength noticeably.
Keywords:-Hybridization, mono fibers, steel fiber, polypropylene fiber, Improvement in mechanical properties.
Material management in construction – a case studyeSAT Journals
Abstract
The objective of the present study is to understand about all the problems occurring in the company because of improper application
of material management. In construction project operation, often there is a project cost variance in terms of the material, equipments,
manpower, subcontractor, overhead cost, and general condition. Material is the main component in construction projects. Therefore,
if the material management is not properly managed it will create a project cost variance. Project cost can be controlled by taking
corrective actions towards the cost variance. Therefore a methodology is used to diagnose and evaluate the procurement process
involved in material management and launch a continuous improvement was developed and applied. A thorough study was carried
out along with study of cases, surveys and interviews to professionals involved in this area. As a result, a methodology for diagnosis
and improvement was proposed and tested in selected projects. The results obtained show that the main problem of procurement is
related to schedule delays and lack of specified quality for the project. To prevent this situation it is often necessary to dedicate
important resources like money, personnel, time, etc. To monitor and control the process. A great potential for improvement was
detected if state of the art technologies such as, electronic mail, electronic data interchange (EDI), and analysis were applied to the
procurement process. These helped to eliminate the root causes for many types of problems that were detected.
Managing drought short term strategies in semi arid regions a case studyeSAT Journals
Abstract
Drought management needs multidisciplinary action. Interdisciplinary efforts among the experts in various fields of the droughts
prone areas are helpful to achieve tangible and permanent solution for this recurring problem. The Gulbarga district having the total
area around 16, 240 sq.km, and accounts 8.45 per cent of the Karnataka state area. The district has been situated with latitude 17º 19'
60" North and longitude of 76 º 49' 60" east. The district is situated entirely on the Deccan plateau positioned at a height of 300 to
750 m above MSL. Sub-tropical, semi-arid type is one among the drought prone districts of Karnataka State. The drought
management is very important for a district like Gulbarga. In this paper various short term strategies are discussed to mitigate the
drought condition in the district.
Keywords: Drought, South-West monsoon, Semi-Arid, Rainfall, Strategies etc.
Life cycle cost analysis of overlay for an urban road in bangaloreeSAT Journals
Abstract
Pavements are subjected to severe condition of stresses and weathering effects from the day they are constructed and opened to traffic
mainly due to its fatigue behavior and environmental effects. Therefore, pavement rehabilitation is one of the most important
components of entire road systems. This paper highlights the design of concrete pavement with added mono fibers like polypropylene,
steel and hybrid fibres for a widened portion of existing concrete pavement and various overlay alternatives for an existing
bituminous pavement in an urban road in Bangalore. Along with this, Life cycle cost analyses at these sections are done by Net
Present Value (NPV) method to identify the most feasible option. The results show that though the initial cost of construction of
concrete overlay is high, over a period of time it prove to be better than the bituminous overlay considering the whole life cycle cost.
The economic analysis also indicates that, out of the three fibre options, hybrid reinforced concrete would be economical without
compromising the performance of the pavement.
Keywords: - Fatigue, Life cycle cost analysis, Net Present Value method, Overlay, Rehabilitation
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materialseSAT Journals
Abstract
The issue of growing demand on our nation’s roadways over that past couple of decades, decreasing budgetary funds, and the need to
provide a safe, efficient, and cost effective roadway system has led to a dramatic increase in the need to rehabilitate our existing
pavements and the issue of building sustainable road infrastructure in India. With these emergency of the mentioned needs and this
are today’s burning issue and has become the purpose of the study.
In the present study, the samples of existing bituminous layer materials were collected from NH-48(Devahalli to Hassan) site.The
mixtures were designed by Marshall Method as per Asphalt institute (MS-II) at 20% and 30% Reclaimed Asphalt Pavement (RAP).
RAP material was blended with virgin aggregate such that all specimens tested for the, Dense Bituminous Macadam-II (DBM-II)
gradation as per Ministry of Roads, Transport, and Highways (MoRT&H) and cost analysis were carried out to know the economics.
Laboratory results and analysis showed the use of recycled materials showed significant variability in Marshall Stability, and the
variability increased with the increase in RAP content. The saving can be realized from utilization of recycled materials as per the
methodology, the reduction in the total cost is 19%, 30%, comparing with the virgin mixes.
Keywords: Reclaimed Asphalt Pavement, Marshall Stability, MS-II, Dense Bituminous Macadam-II
Laboratory investigation of expansive soil stabilized with natural inorganic ...eSAT Journals
This document summarizes a study on stabilizing expansive black cotton soil with the natural inorganic stabilizer RBI-81. Laboratory tests were conducted to evaluate the effect of RBI-81 on the soil's engineering properties. The tests showed that with 2% RBI-81 and 28 days of curing, the unconfined compressive strength increased by around 250% and the CBR value improved by approximately 400% compared to the untreated soil. Overall, the study found that RBI-81 effectively improved the strength properties of the black cotton soil and its suitability as a soil stabilizer was supported.
Influence of reinforcement on the behavior of hollow concrete block masonry p...eSAT Journals
Abstract
Reinforced masonry was developed to exploit the strength potential of masonry and to solve its lack of tensile strength. Experimental
and analytical studies have been carried out to investigate the effect of reinforcement on the behavior of hollow concrete block
masonry prisms under compression and to predict ultimate failure compressive strength. In the numerical program, three dimensional
non-linear finite elements (FE) model based on the micro-modeling approach is developed for both unreinforced and reinforced
masonry prisms using ANSYS (14.5). The proposed FE model uses multi-linear stress-strain relationships to model the non-linear
behavior of hollow concrete block, mortar, and grout. Willam-Warnke’s five parameter failure theory has been adopted to model the
failure of masonry materials. The comparison of the numerical and experimental results indicates that the FE models can successfully
capture the highly nonlinear behavior of the physical specimens and accurately predict their strength and failure mechanisms.
Keywords: Structural masonry, Hollow concrete block prism, grout, Compression failure, Finite element method,
Numerical modeling.
Influence of compaction energy on soil stabilized with chemical stabilizereSAT Journals
This document summarizes a study on the influence of compaction energy on soil stabilized with a chemical stabilizer. Laboratory tests were conducted on locally available loamy soil treated with a patented polymer liquid stabilizer and compacted at four different energy levels. The study found that increasing the compaction effort increased the density of both untreated and treated soil, but the rate of increase was lower for stabilized soil. Treating the soil with the stabilizer improved its unconfined compressive strength and resilient modulus, and reduced accumulated plastic strain, with these properties further improved by higher compaction efforts. The stabilized soil exhibited strength and performance benefits compared to the untreated soil.
Geographical information system (gis) for water resources managementeSAT Journals
This document describes a hydrological framework developed in the form of a Hydrologic Information System (HIS) to meet the information needs of various government departments related to water management in a state. The HIS consists of a hydrological database coupled with tools for collecting and analyzing spatial and non-spatial water resources data. It also incorporates a hydrological model to indirectly assess water balance components over space and time. A web-based GIS portal was created to allow users to access and visualize the hydrological data, as well as outputs from the SWAT hydrological model. The framework is intended to facilitate integrated water resources planning and management across different administrative levels.
Forest type mapping of bidar forest division, karnataka using geoinformatics ...eSAT Journals
Abstract
The study demonstrate the potentiality of satellite remote sensing technique for the generation of baseline information on forest types
including tree plantation details in Bidar forest division, Karnataka covering an area of 5814.60Sq.Kms. The Total Area of Bidar
forest division is 5814Sq.Kms analysis of the satellite data in the study area reveals that about 84% of the total area is Covered by
crop land, 1.778% of the area is covered by dry deciduous forest, 1.38 % of mixed plantation, which is very threatening to the
environmental stability of the forest, future plantation site has been mapped. With the use of latest Geo-informatics technology proper
and exact condition of the trees can be observed and necessary precautions can be taken for future plantation works in an appropriate
manner
Keywords:-RS, GIS, GPS, Forest Type, Tree Plantation
Factors influencing compressive strength of geopolymer concreteeSAT Journals
Abstract
To study effects of several factors on the properties of fly ash based geopolymer concrete on the compressive strength and also the
cost comparison with the normal concrete. The test variables were molarities of sodium hydroxide(NaOH) 8M,14M and 16M, ratio of
NaOH to sodium silicate (Na2SiO3) 1, 1.5, 2 and 2.5, alkaline liquid to fly ash ratio 0.35 and 0.40 and replacement of water in
Na2SiO3 solution by 10%, 20% and 30% were used in the present study. The test results indicated that the highest compressive
strength 54 MPa was observed for 16M of NaOH, ratio of NaOH to Na2SiO3 2.5 and alkaline liquid to fly ash ratio of 0.35. Lowest
compressive strength of 27 MPa was observed for 8M of NaOH, ratio of NaOH to Na2SiO3 is 1 and alkaline liquid to fly ash ratio of
0.40. Alkaline liquid to fly ash ratio of 0.35, water replacement of 10% and 30% for 8 and 16 molarity of NaOH and has resulted in
compressive strength of 36 MPa and 20 MPa respectively. Superplasticiser dosage of 2 % by weight of fly ash has given higher
strength in all cases.
Keywords: compressive strength, alkaline liquid, fly ash
Experimental investigation on circular hollow steel columns in filled with li...eSAT Journals
Abstract
Composite Circular hollow Steel tubes with and without GFRP infill for three different grades of Light weight concrete are tested for
ultimate load capacity and axial shortening , under Cyclic loading. Steel tubes are compared for different lengths, cross sections and
thickness. Specimens were tested separately after adopting Taguchi’s L9 (Latin Squares) Orthogonal array in order to save the initial
experimental cost on number of specimens and experimental duration. Analysis was carried out using ANN (Artificial Neural
Network) technique with the assistance of Mini Tab- a statistical soft tool. Comparison for predicted, experimental & ANN output is
obtained from linear regression plots. From this research study, it can be concluded that *Cross sectional area of steel tube has most
significant effect on ultimate load carrying capacity, *as length of steel tube increased- load carrying capacity decreased & *ANN
modeling predicted acceptable results. Thus ANN tool can be utilized for predicting ultimate load carrying capacity for composite
columns.
Keywords: Light weight concrete, GFRP, Artificial Neural Network, Linear Regression, Back propagation, orthogonal
Array, Latin Squares
Experimental behavior of circular hsscfrc filled steel tubular columns under ...eSAT Journals
This document summarizes an experimental study that tested circular concrete-filled steel tube columns with varying parameters. 45 specimens were tested with different fiber percentages (0-2%), tube diameter-to-wall-thickness ratios (D/t from 15-25), and length-to-diameter (L/d) ratios (from 2.97-7.04). The results found that columns filled with fiber-reinforced concrete exhibited higher stiffness, equal ductility, and enhanced energy absorption compared to those filled with plain concrete. The load carrying capacity increased with fiber content up to 1.5% but not at 2.0%. The analytical predictions of failure load closely matched the experimental values.
Evaluation of punching shear in flat slabseSAT Journals
Abstract
Flat-slab construction has been widely used in construction today because of many advantages that it offers. The basic philosophy in
the design of flat slab is to consider only gravity forces; this method ignores the effect of punching shear due to unbalanced moments
at the slab column junction which is critical. An attempt has been made to generate generalized design sheets which accounts both
punching shear due to gravity loads and unbalanced moments for cases (a) interior column; (b) edge column (bending perpendicular
to shorter edge); (c) edge column (bending parallel to shorter edge); (d) corner column. These design sheets are prepared as per
codal provisions of IS 456-2000. These design sheets will be helpful in calculating the shear reinforcement to be provided at the
critical section which is ignored in many design offices. Apart from its usefulness in evaluating punching shear and the necessary
shear reinforcement, the design sheets developed will enable the designer to fix the depth of flat slab during the initial phase of the
design.
Keywords: Flat slabs, punching shear, unbalanced moment.
Evaluation of performance of intake tower dam for recent earthquake in indiaeSAT Journals
Abstract
Intake towers are typically tall, hollow, reinforced concrete structures and form entrance to reservoir outlet works. A parametric
study on dynamic behavior of circular cylindrical towers can be carried out to study the effect of depth of submergence, wall thickness
and slenderness ratio, and also effect on tower considering dynamic analysis for time history function of different soil condition and
by Goyal and Chopra accounting interaction effects of added hydrodynamic mass of surrounding and inside water in intake tower of
dam
Key words: Hydrodynamic mass, Depth of submergence, Reservoir, Time history analysis,
Evaluation of operational efficiency of urban road network using travel time ...eSAT Journals
This document evaluates the operational efficiency of an urban road network in Tiruchirappalli, India using travel time reliability measures. Traffic volume and travel times were collected using video data from 8-10 AM on various roads. Average travel times, 95th percentile travel times, and buffer time indexes were calculated to assess reliability. Non-motorized vehicles were found to most impact reliability on one road. A relationship between buffer time index and traffic volume was developed. Finally, a travel time model was created and validated based on length, speed, and volume.
Estimation of surface runoff in nallur amanikere watershed using scs cn methodeSAT Journals
Abstract
The development of watershed aims at productive utilization of all the available natural resources in the entire area extending from
ridge line to stream outlet. The per capita availability of land for cultivation has been decreasing over the years. Therefore, water and
the related land resources must be developed, utilized and managed in an integrated and comprehensive manner. Remote sensing and
GIS techniques are being increasingly used for planning, management and development of natural resources. The study area, Nallur
Amanikere watershed geographically lies between 110 38’ and 110 52’ N latitude and 760 30’ and 760 50’ E longitude with an area of
415.68 Sq. km. The thematic layers such as land use/land cover and soil maps were derived from remotely sensed data and overlayed
through ArcGIS software to assign the curve number on polygon wise. The daily rainfall data of six rain gauge stations in and around
the watershed (2001-2011) was used to estimate the daily runoff from the watershed using Soil Conservation Service - Curve Number
(SCS-CN) method. The runoff estimated from the SCS-CN model was then used to know the variation of runoff potential with different
land use/land cover and with different soil conditions.
Keywords: Watershed, Nallur watershed, Surface runoff, Rainfall-Runoff, SCS-CN, Remote Sensing, GIS.
Estimation of morphometric parameters and runoff using rs & gis techniqueseSAT Journals
This document summarizes a study that used remote sensing and GIS techniques to estimate morphometric parameters and runoff for the Yagachi catchment area in India over a 10-year period. Morphometric analysis was conducted to understand the hydrological response at the micro-watershed level. Daily runoff was estimated using the SCS curve number model. The results showed a positive correlation between rainfall and runoff. Land use/land cover changes between 2001-2010 were found to impact estimated runoff amounts. Remote sensing approaches provided an effective means to model runoff for this large, ungauged area.
Effect of variation of plastic hinge length on the results of non linear anal...eSAT Journals
Abstract The nonlinear Static procedure also well known as pushover analysis is method where in monotonically increasing loads are applied to the structure till the structure is unable to resist any further load. It is a popular tool for seismic performance evaluation of existing and new structures. In literature lot of research has been carried out on conventional pushover analysis and after knowing deficiency efforts have been made to improve it. But actual test results to verify the analytically obtained pushover results are rarely available. It has been found that some amount of variation is always expected to exist in seismic demand prediction of pushover analysis. Initial study is carried out by considering user defined hinge properties and default hinge length. Attempt is being made to assess the variation of pushover analysis results by considering user defined hinge properties and various hinge length formulations available in literature and results compared with experimentally obtained results based on test carried out on a G+2 storied RCC framed structure. For the present study two geometric models viz bare frame and rigid frame model is considered and it is found that the results of pushover analysis are very sensitive to geometric model and hinge length adopted. Keywords: Pushover analysis, Base shear, Displacement, hinge length, moment curvature analysis
Effect of use of recycled materials on indirect tensile strength of asphalt c...eSAT Journals
Abstract
Depletion of natural resources and aggregate quarries for the road construction is a serious problem to procure materials. Hence
recycling or reuse of material is beneficial. On emphasizing development in sustainable construction in the present era, recycling of
asphalt pavements is one of the effective and proven rehabilitation processes. For the laboratory investigations reclaimed asphalt
pavement (RAP) from NH-4 and crumb rubber modified binder (CRMB-55) was used. Foundry waste was used as a replacement to
conventional filler. Laboratory tests were conducted on asphalt concrete mixes with 30, 40, 50, and 60 percent replacement with RAP.
These test results were compared with conventional mixes and asphalt concrete mixes with complete binder extracted RAP
aggregates. Mix design was carried out by Marshall Method. The Marshall Tests indicated highest stability values for asphalt
concrete (AC) mixes with 60% RAP. The optimum binder content (OBC) decreased with increased in RAP in AC mixes. The Indirect
Tensile Strength (ITS) for AC mixes with RAP also was found to be higher when compared to conventional AC mixes at 300C.
Keywords: Reclaimed asphalt pavement, Foundry waste, Recycling, Marshall Stability, Indirect tensile strength.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Technical Drawings introduction to drawing of prisms
Usage of regular expressions in nlp
1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Issue: 01 | Jan-2014, Available @ http://www.ijret.org 168
USAGE OF REGULAR EXPRESSIONS IN NLP
Gaganpreet Kaur
Computer Science Department, Guru Nanak Dev University, Amritsar, gagansehdev204@gmail.com
Abstract
A usage of regular expressions to search text is well known and understood as a useful technique. Regular Expressions are generic
representations for a string or a collection of strings. Regular expressions (regexps) are one of the most useful tools in computer
science. NLP, as an area of computer science, has greatly benefitted from regexps: they are used in phonology, morphology, text
analysis, information extraction, & speech recognition. This paper helps a reader to give a general review on usage of regular
expressions illustrated with examples from natural language processing. In addition, there is a discussion on different approaches of
regular expression in NLP.
Keywords— Regular Expression, Natural Language Processing, Tokenization, Longest common subsequence alignment,
POS tagging
------------------------------------------------------------------------***----------------------------------------------------------------------
1. INTRODUCTION
Natural language processing is a large and
multidisciplinary field as it contains infinitely many
sentences. NLP began in the 1950s as the
intersection of artificial intelligence and linguistics.
Also there is much ambiguity in natural language.
There are many words which have several meanings,
such as can, bear, fly, orange, and sentences have
meanings different in different contexts. This makes
creation of programs that understands a natural
language, a challenging task [1] [5] [8].
The steps in NLP are [8]:
1. Morphology: Morphology concerns the way words
are built up from smaller meaning bearing units.
2. Syntax: Syntax concerns how words are put together
to form correct sentences and what structural role each
word has.
3. Semantics: Semantics concerns what words mean and
how these meanings combine in sentences to form
sentence meanings.
4. Pragmatics: Pragmatics concerns how sentences are
used in different situations and how use affects the
interpretation of the sentence.
5. Discourse: Discourse concerns how the immediately
preceding sentences affect the interpretation of the
next sentence.
Fig 1: Steps in Natural Language Processing [8]
Figure 1 illustrates the steps or stages that followed
in Natural Language processing in which surface text
that is input is converted into the tokens by using the
parsing or Tokenisation phase and then its syntax
and semantics should be checked.
2. REGULAR EXPRESSION
Regular expressions (regexps) are one of the most useful tools
in computer science. RE is a formal language for specifying the
string. Most commonly called the search expression. NLP, as
an area of computer science, has greatly benefitted from
regexps: they are used in phonology, morphology, text analysis,
information extraction, & speech recognition. Regular
expressions are placed inside the pair of matching. A regular
expression, or RE, describes strings of characters (words or
phrases or any arbitrary text). It's a pattern that matches certain
strings and doesn't match others. A regular expression is a set
of characters that specify a pattern. Regular expressions are
2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Issue: 01 | Jan-2014, Available @ http://www.ijret.org 169
case-sensitive. Regular Expression performs various functions
in SQL:
REGEXP_LIKE: Determine whether pattern matches
REGEXP_SUBSTR: Determine what string matches the
pattern
REGEXP_INSTR: Determine where the match occurred in the
string
REGEXP_REPLACE: Search and replace a pattern
How can RE be used in NLP?
1) Validate data fields (e.g., dates, email address, URLs,
abbreviations)
2) Filter text (e.g., spam, disallowed web sites)
3) Identify particular strings in a text (e.g., token
boundaries)
4) Convert the output of one processing component into
the format required for a second component [4].
3. RELATED WORK
(A.N. Arslan and Dan, 2006) have proposed the constrained
sequence alignment process. The regular expression
constrained sequence alignment has been introduced for this
purpose. An alignment satisfies the constraint if part of it
matches a given regular expression in each dimension (i.e. in
each sequence aligned). There is a method that rewards the
alignments that include a region matching the given regular
expression. This method does not always guarantee the
satisfaction of the constraint.
(M. Shahbaz, P. McMinn and M. Stevenson, 2012) presented a
novel approach of finding valid values by collating suitable
regular expressions dynamically that validate the format of the
string values, such as an email address. The regular expressions
are found using web searches that are driven by the identifiers
appearing in the program, for example a string parameter called
email Address. The identifier names are processed through
natural language processing techniques to tailor the web queries.
Once a regular expression has been found, a secondary web
search is performed for strings matching the regular expression.
(Xiaofei Wang and Yang Xu, 2013)discussed the
deep packet inspection that become a key
component in network intrusion detection systems,
where every packet in the incoming data stream
needs to be compared with patterns in an attack
database, byte-by-byte, using either string matching
or regular expression matching. Regular expression
matching, despite its flexibility and efficiency in
attack identification, brings significantly high
computation and storage complexities to NIDSes,
making line-rate packet processing a challenging
task. In this paper, authors present stride finite
automata (StriFA), a novel finite automata family, to
accelerate both string matching and regular
expression matching.
4. BASIC RE PATTERNS IN NLP [8]
Here is a discussion of regular expression syntax or
patterns and its meaning in various contexts.
Table 1 Regular expression meaning [8]
Character Regular-expression
meaning
. Any character, including
whitespace or numeric
? Zero or one of the preceding
character
* Zero or more of the
preceding character
+ One or more of the
preceding character
^ Negation or complement
In table 1, all the different characters have their own meaning.
Table 2 Regular expression String matching [8]
RE String matched
/woodchucks/ “interesting links to
woodchucks and lemurs”
/a/ “Sammy Ali stopped by
Neha‟s”
/Ali says,/ “My gift please,” Ali says,”
/book/ “all our pretty books”
/!/ “Leave him behind!” said
Sammy to neha.
Table3 Regular expression matching [8]
RE Match
/[wW]oodchuck/
Woodchuck or woodchuck
/[abc]/ “a”, “b”, or “c”
/[0123456789]/ Any digit
In Table 2 and Table 3, there is a discussion on regular
expression string matching as woodchucks is matched with the
string interesting links to woodchucks and lemurs.
Table 4 Regular expression Description [8]
RE Description
/a*/ Zero or more a‟s
/a+/ One or more a‟s
3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Issue: 01 | Jan-2014, Available @ http://www.ijret.org 170
/a? / Zero or one a‟s
/cat|dog/ „cat‟ or „dog‟
In Table 4, there is a description of various regular
expressions that are used in natural language
processing.
Table 5 Regular expression kleene operators [8]
Pattern Description
Colou?r Optional previous char
oo*h! 0 or more of previous char
o+h! 1 or more of previous char
In Table 5 there is a discussion on three kleene operators with
its meaning.
Regular expression: Kleene *, Kleene +, wildcard
Special charater + (aka Kleene +) specifies one or more
occurrences of the regular expression that comes right before it.
Special character * (aka Kleene *) specifies zero or more
occurrences of the regular expression that comes right before it.
Special character. (Wildcard) specifies any single character.
Regular expression: Anchors
Anchors are special characters that anchor a regular expression
to specific position in the text they are matched against. The
anchors are ^ and $ anchor regular expressions at the beginning
and end of the text, respectively [8].
Regular expression: Disjunction
Disjunction of characters inside a regular expression is done
with the matching square brackets [ ]. All characters inside [ ]
are part of the disjunction.
Regular expression: Negation [^] in disjunction
Carat means negation only when first in [].
5. GENERATION OF RE FROM SHEEP
LANGUAGE
In NLP, the concept of regular expression (RE) by using Sheep
Language is illustrated as [8]:
A sheep can talk to another sheep with this language
“baaaaa!” .In this there is a discussion on how a regular
expression can be generated from a SHEEP Language.
In the Sheep language:
“ba!”, “baa!”, “baaaaa!”
Finite state automata
Double circle indicates “accept state”
Regular Expression
(baa*) or (baa+!)
In this way a regular expression can be generated from a
Language by creating finite state automata first as from it RE is
generated.
6. DIFFERENT APPROACHES OF RE IN NLP [2] [3]
[10]
6.1 An Improved Algorithm for the Regular
Expression Constrained Multiple Sequence Alignment
Problems [2]
This is the first approach, in which there is an illustration in
which two synthetic sequences S1 = TGFPSVGKTKDDA, and
S2 =TFSVAKDDDGKSA are aligned in a way to maximize the
number of matches (this is the longest common subsequence
problem). An optimal alignment with 8 matches is shown in
part 3(a).
Fig 2(a): An optimal alignment with 8 matches [2]
For the regular expression constrained sequence alignment
problem in NLP with regular expression, R = (G + A)
££££GK(S + T), where £ denotes a fixed alphabet over which
sequences are defined, the alignments sought change.
The alignment in Part 2(a) does not satisfy the regular
expression constraint. Part2 (b) shows an alignment with which
the constraint is satisfied.
4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Issue: 01 | Jan-2014, Available @ http://www.ijret.org 171
Fig 2(b): An alignment with the constraint [2]
The alignment includes a region (shown with a rectangle
drawn in dashed lines in the figure 2b) where the substring
GFPSVGKT of S1 is aligned with substring AKDDDGKS of
S2, and both substrings match R. In this case, optimal number
of matches achievable with the constraint decreases to 4 [2].
6.2 Automated Discovery of Valid Test Strings from
the Web using Dynamic Regular Expressions
Collation and Natural Language Processing [3]
In this second approach, there is a combination of Natural
Language Processing (NLP) techniques and dynamic regular
expressions collation for finding valid String values on the
Internet. The rest of the section provides details for each part of
the approach [3].
Fig 3: Overview of the approach [3]
6.2.1 Extracting Identifiers
The key idea behind the approach is to extract important
information from program identifiers and use them to generate
web queries. An identifier is a name that identifies either a
unique object or a unique class of objects. An identifier may be
a word, number, letter, symbol, or any combination of those.
For example, an identifier name including the string “email” is
a strong indicator that its value is expected to be an email
address. A web query containing “email” can be used to
retrieve example email addresses from the Internet [3].
6.2.2 Processing Identifier Names
Once the identifiers are extracted, their names are processed
using the following NLP techniques.
1) Tokenisation: Identifier names are often formed from
concatenations of words and need to be split into separate
words (or tokens) before they can be used in web queries.
Identifiers are split into tokens by replacing underscores with
whitespace and adding a whitespace before each sequence of
upper case letters.
For example, “an_Email_Address_Str” becomes “an email
address str” and “parseEmailAddressStr” becomes “parse email
address str”. Finally, all characters are converted to lowercase
[3].
2) PoS Tagging: Identifier names often contain words such as
articles (“a”, “and”, “the”) and prepositions (“to”, “at” etc.) that
are not useful when included in web queries. In addition,
method names often contain verbs as a prefix to describe the
action they are intended to perform.
For example, “parseEmailAddressStr” is supposed to parse an
email address. The key information for the input value is
contained in the noun “email address”, rather than the verb
“parse”. The part-of-speech category in the Identifier names
can be identified using a NLP tool called Part-of-Speech (PoS)
tagger, and thereby removing any non-noun tokens. Thus, “an
email address str” and “parse email address str” both become
“email address str” [3].
3) Removal of Non-Words: Identifier names may include non-
words which can reduce the quality of search results. Therefore,
names are filtered so that the web query should entirely consist
of meaningful words. This is done by removing any word in the
processed identifier name that is not a dictionary word.
For example, “email address str” becomes “email address”,
since “str” is not a dictionary word [3].
6.2.3 Obtaining Regular Expressions [3]
The regular expressions for the identifiers are obtained
dynamically from two ways:
1) RegExLib Search, and
2) Web Search.
RegExLib: RegExLib [9] is an online regular expression
library that is currently indexing around 3300 expressions for
different types (e.g., email, URL, postcode) and scientific
notations. It provides an interface to search for a regular
expression.
Web Search: When RegExLib is unable to produce regular
expressions, the approach performs a simple web search. The
search query is formulated by prefixing the processed identifier
names with the string “regular expression”.
For example, the regular expressions for “email address” are
searched by applying the query “regular expression” email
address. The regular expressions are collected by identifying
any string that starts with ^ and ends with $ symbols.
6.2.4 Generating Web Queries
Once the regular expressions are obtained, valid values can be
generated automatically, e.g., using automaton. However, the
objective here is to generate not only valid values but also
realistic and meaningful values. Therefore, a secondary web
search is performed to identify values on the Internet matching
the regular expressions. This section explains the generation of
web queries for the secondary search to identify valid values.
5. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Issue: 01 | Jan-2014, Available @ http://www.ijret.org 172
The web queries include different versions of pluralized and
quoting styles explained in the following.
6.2.4.1 Pluralization
The approach generates pluralized versions of the processed
identifier names by pluralizing the last word according to the
grammar rules.
For example, “email address” becomes “email addresses”.
6.2.4.2 Quoting
The approach generates queries with or without quotes. The
former style enforces the search engine to target web pages that
contain all words in the query as a complete phrase. The latter
style is a general search to target web pages that contain the
words in the query. In total, 4 queries are generated for each
identifier name that represents all combinations of pluralisation
and quoting styles. For a processed identifier name “email
address”, the generated web queries are: email address, email
addresses; “email address”, “email addresses”.
6.2.4.3 Identifying Values
Finally, the regular expressions and the downloaded web pages
are used to identify valid values.
6.3 StriDFA for Multistring Matching [10]
In this method, multistring matching is done and is one of the
better approaches for matching pattern among the above two
approaches but still contain limitations.
Deterministic finite automaton (DFA) and nondeterministic
finite automaton (NFA) are two typical finite automata used to
implement regex matching. DFA is fast and has deterministic
matching performance, but suffers from the memory explosion
problem. NFA, on the other hand, requires less memory, but
suffers from slow and nondeterministic matching performance.
Therefore, neither of them is suitable for implementing high
speed regex matching in environments where the fast memory
is limited.
Fig 4: Traditional DFA for patterns “reference” and
“replacement” (some transitions are partly ignored for
simplicity) [10]
Suppose here two patterns are to be matched, “reference” (P1)
and “replacement” (P2. The matching process is performed by
sending the input stream to the automaton byte by byte. If the
DFA reaches any of its accept states (the states with double
circles), say that a match is found.
Fig 5: Tag “e” and a sliding window used to convert an input
stream into an SL stream with tag “e.” [10]
Fig 6: Sample StriDFA of patterns “reference” and
“replacement” with character “e” set as the tag (some
transitions are partly ignored) [10]
The construction of the StriDFA in this example is very simple.
Here first need to convert the patterns to SL streams. As shown
in Figure 6, the SL of patterns P1 and P2 are Fe (P1) =2 2 3 and
Fe (P2) = 5 2, respectively. Then the classical DFA
construction algorithm is constructed for StriDFA.
It is easy to see that the number of states to be visited during
the processing is equal to the length of the input stream (in
units of bytes), and this number determines the time required
for finishing the matching process (each state visit requires a
memory access, which is a major bottleneck in today‟s
computer systems). This scheme is designed to reduce the
number of states to be visited during the matching process. If
this objective is achieved, the number of memory accesses
required for detecting a match can be reduced, and
consequently, the pattern matching speed can be improved. One
way to achieve this objective is to reduce the number of
characters sent to the DFA. For example, if select “e” as the
tag and consider the input stream “referenceabcdreplacement,”
as shown in Figure 5 , then the corresponding stride length (SL)
stream is Fe(S) = 2 2 3 6 5 2, where Fe(S) denotes the SL
stream of the input stream S, “e” denotes the tag character in
6. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Issue: 01 | Jan-2014, Available @ http://www.ijret.org 173
use. The underscore is used to indicate an SL, to distinguish it
as not being a character. As shown in Figure 6, the SL of the
input stream is fed into the StriDFA and compared with the SL
streams extracted from the rule set.
7. SUMMARY OF APPROACHES
Table 6 Regular expression meaning [8]
APPROACH DESCRIPTION PROBLEM
An improved
algorithm for
the regular
expression
constrained
multiple
sequence
alignment
problems
In this approach, a
particular RE
constraint is
followed.
Follows
constraints for
string matching
and used in
limited areas and
the results
produced are
arbitrary.
Automated
Discovery of
Valid Test
Strings from the
Web using
Dynamic
Regular
Expressions
Collation and
Natural
Language
Processing
In this approach,
gives valid values
and another benefit
is that the generated
values are also
realistic rather than
arbitrary.
Multiple strings
can‟t be
processed or
matched at the
same time.
StriDFA for
Multistring
Matching
In this approach,
multistring matching
is performed. The
main scheme is to
reduce the number
of states during the
matching process.
Didn‟t processed
the larger
alphabet set.
In Table VI, the approaches are different and have its own
importance. As the first approach follows some constraints for
RE and used in case where a particular RE pattern is given and
second and the third approach is quite different from the first
one as there is no such concept of constraints. As second
approach is used to extract and match the patterns from the web
queries and these queries can be of any pattern. We suggest
second approach which is better than the first and third
approach as it generally gives valid values and another benefit
is that the generated values are also realistic rather than
arbitrary. This is because the values are obtained from the
Internet which is a rich source of human-recognizable data.
The third approach is used to achieve an ultrahigh matching
speed with a relatively low memory usage. But still third
approach is not working for the large alphabet sets.
8. APPLICATIONS OF RE IN NLP [8]
1. Web search
2. Word processing, find, substitute (MS-WORD)
3. Validate fields in a database (dates, email address,
URLs)
4. Information extraction (e.g. people & company names)
CONCLUSIONS & FUTURE WORK
A regular expression, or RE, describes strings of characters
(words or phrases or any arbitrary text). An attempt has been
made to present the theory of regular expression in natural
language processing in a unified way, combining the results of
several authors and showing the different approaches for
regular expression i.e. first is a expression constrained multiple
sequence alignment problem and second is combination of
Natural Language Processing (NLP) techniques and dynamic
regular expressions collation for finding valid String values on
the Internet and third is Multistring matching regex. All these
approaches have their own importance and at last practical
applications are discussed.
Future work is to generate advanced algorithms for obtaining
and filtering regular expressions that shall be investigated to
improve the precision of valid values.
REFERENCES
[1]. K.R. Chowdhary, “Introduction to Parsing - Natural
Language Processing” http://krchowdhary.com/me-nlp/nlp-
01.pdf, April, 2012.
[2]. A .N. Arslan and Dan He, “An improved algorithm for the
regular expression constrained multiple sequence alignment
problem”, Proc.: Sixth IEEE Symposium on BionInformatics
and Bio Engineering (BIBE'06), 2006
[3]. M. Shahbaz, P. McMinn and M. Stevenson, "Automated
Discovery of Valid Test Strings from the Web using Dynamic
Regular Expressions Collation and Natural Language
Processing". Proceedings of the International Conference on
Quality Software (QSIC 2012), IEEE, pp. 79–88.
[4]. Sai Qian, “Applications for NLP -Lecture 6: Regular
Expression”
http://talc.loria.fr/IMG/pdf/Lecture_6_Regular_Expression-
5.pdf , October, 2011
[5]. P. M Nadkarni, L.O. Machado, W. W. Chapman
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3168328/
[6]. H. A. Muhtaseb, “Lecture 3: REGULAR EXPRESSIONS
AND AUTOMATA”
[7]. A.McCallum, U. Amherst, “Introduction to Natural
Language Processing-Lecture 2”
CMPSCI 585, 2007
[8]. D. Jurafsky and J.H. Martin, “Speech and Language
Processing” Prentice Hall, 2 edition, 2008.
7. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Issue: 01 | Jan-2014, Available @ http://www.ijret.org 174
[9]. RegExLib: Regular Expression Library.
http://regexlib.com/.
[10]. Xiaofei Wang and Yang Xu, “StriFA: Stride Finite
Automata for High-Speed Regular Expression Matching in
Network Intrusion Detection Systems” IEEE Systems Journal,
Vol. 7, No. 3, September 2013.