The document proposes a new approach to compare stock market patterns to DNA sequences using compression techniques. Stock market data is converted to binary sequences representing increases and decreases, which are then encoded into DNA nucleotides. These nucleotide sequences are divided and matched against human genome sequences using BLAST. The analysis found certain sub-sequences of the stock market patterns matched 100% to the human genome, suggesting this approach could potentially predict stock market behavior.
Prediction of Answer Keywords using Char-RNNIJECEIAES
Generating sequences of characters using a Recurrent Neural Network (RNN) is a tried and tested method for creating unique and context aware words, and is fundamental in Natural Language Processing tasks. These type of Neural Networks can also be used a question-answering system. The main drawback of most of these systems is that they work from a factoid database of information, and when queried about new and current information, the responses are usually bleak. In this paper, the author proposes a novel approach to finding answer keywords from a given body of news text or headline, based on the query that was asked, where the query would be of the nature of current affairs or recent news, with the use of Gated Recurrent Unit (GRU) variant of RNNs. Thus, this ensures that the answers provided are relevant to the content of query that was put forth.
Experimental Result Analysis of Text Categorization using Clustering and Clas...ijtsrd
In a world that routinely produces more textual data. It is very critical task to managing that textual data. There are many text analysis methods are available to managing and visualizing that data, but many techniques may give less accuracy because of the ambiguity of natural language. To provide the ne grained analysis, in this paper introduce e cient machine learning algorithms for categorize text data. To improve the accuracy, in proposed system I introduced Natural language toolkit NLTK python library to perform natural language processing. The main aim of proposed system is to generalize the model for real time text categorization applications by using e cient text classi cation as well as clustering machine learning algorithms and nd the efficient and accurate model for input dataset using performance measure concept. Patil Kiran Sanajy | Prof. Kurhade N. V. ""Experimental Result Analysis of Text Categorization using Clustering and Classification Algorithms"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd25077.pdf
Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/25077/experimental-result-analysis-of-text-categorization-using-clustering-and-classification-algorithms/patil-kiran-sanajy
Case-Based Reasoning for Explaining Probabilistic Machine Learningijcsit
This paper describes a generic framework for explaining the prediction of probabilistic machine learning
algorithms using cases. The framework consists of two components: a similarity metric between cases that
is defined relative to a probability model and an novel case-based approach to justifying the probabilistic
prediction by estimating the prediction error using case-based reasoning. As basis for deriving similarity
metrics, we define similarity in terms of the principle of interchangeability that two cases are considered
similar or identical if two probability distributions, derived from excluding either one or the other case in the
case base, are identical. Lastly, we show the applicability of the proposed approach by deriving a metric for
linear regression, and apply the proposed approach for explaining predictions of the energy performance of
households.
Performance analysis of neural network models for oxazolines and oxazoles der...ijistjournal
Neural networks have been used successfully to a br
oad range of areas such as business, data mining, d
rug
discovery and biology. In medicine, neural network
s have been applied widely in medical diagnosis,
detection and evaluation of new drugs and treatment
cost estimation. In addition, neural networks have
begin practice in data mining strategies for the a
im of prediction, knowledge discovery. This paper
will
present the application of neural networks for the
prediction and analysis of antitubercular activity
of
Oxazolines and Oxazoles derivatives. This study pre
sents techniques based on the development of Single
hidden layer neural network (SHLFFNN), Gradient Des
cent Back propagation neural network (GDBPNN),
Gradient Descent Back propagation with momentum neu
ral network (GDBPMNN), Back propagation with
Weight decay neural network (BPWDNN) and Quantile r
egression neural network (QRNN) of artificial
neural network (ANN) models Here, we comparatively
evaluate the performance of five neural network
techniques. The evaluation of the efficiency of eac
h model by ways of benchmark experiments is an
accepted application. Cross-validation and resampli
ng techniques are commonly used to derive point
estimates of the performances which are compared to
identify methods with good properties. Predictiv
e
accuracy was evaluated using the root mean squared
error (RMSE), Coefficient determination(
), mean
absolute error(MAE), mean percentage error(MPE) and
relative square error(RSE). We found that all five
neural network models were able to produce feasible
models. QRNN model is outperforms with all
statistical tests amongst other four models.
Trajectory Data Fuzzy Modeling : Ambulances Management Use Caseijdms
Data captured through mobile devices and sensors represent valuable information for organizations. This
collected information comes in huge volume and usually carry uncertain data. Due to this quality issue
difficulties occur in analyzing the trajectory data warehouse. Moreover, the interpretation of the analysis
can vary depending on the background of the user and this will make it difficult to fulfill the analytical
needs of an enterprise. In this paper, we will show the benefits of fuzzy logic in solving the challenges
related to mobility data by integrating fuzzy concepts into the conceptual and the logical model. We use the
ambulance management use case to illustrate our contributions.
Blended intelligence of FCA with FLC for knowledge representation from cluste...IJECEIAES
Formal concept analysis is the process of data analysis mechanism with emergent attractiveness across various fields such as data mining, robotics, medical, big data and so on. FCA is helpful to generate the new learning ontology based techniques. In medical field, some growing kids are facing the problem of representing their knowledge from their gathered prior data which is in the form of unordered and insufficient clustered data which is not supporting them to take the right decision on right time for solving the uncertainty based questionnaires. In the approach of decision theory, many mathematical replicas such as probability-allocation, crisp set, and fuzzy based set theory were designed to deals with knowledge representation based difficulties along with their characteristic. This paper is proposing new ideological blended approach of FCA with FLC and described with major objectives: primarily the FCA analyzes the data based on relationships between the set of objects of prior-attributes and the set of attributes based prior-data, which the data is framed with data-units implicated composition which are formal statements of idea of human thinking with conversion of significant intelligible explanation. Suitable rules are generated to explore the relationship among the attributes and used the formal concept analysis from these suitable rules to explore better knowledge and most important factors affecting the decision making. Secondly how the FLC derive the fuzzification, rule-construction and defuzzification methods implicated for representing the accurate knowledge for uncertainty based questionnaires. Here the FCA is projected to expand the FCA based conception with help of the objective based item set notions considered as the target which is implicated with the expanded cardinalities along with its weights which is associated through the fuzzy based inference decision rules. This approach is more helpful for medical experts for knowing the range of patient’s memory deficiency also for people whose are facing knowledge explorer deficiency.
Prediction of Answer Keywords using Char-RNNIJECEIAES
Generating sequences of characters using a Recurrent Neural Network (RNN) is a tried and tested method for creating unique and context aware words, and is fundamental in Natural Language Processing tasks. These type of Neural Networks can also be used a question-answering system. The main drawback of most of these systems is that they work from a factoid database of information, and when queried about new and current information, the responses are usually bleak. In this paper, the author proposes a novel approach to finding answer keywords from a given body of news text or headline, based on the query that was asked, where the query would be of the nature of current affairs or recent news, with the use of Gated Recurrent Unit (GRU) variant of RNNs. Thus, this ensures that the answers provided are relevant to the content of query that was put forth.
Experimental Result Analysis of Text Categorization using Clustering and Clas...ijtsrd
In a world that routinely produces more textual data. It is very critical task to managing that textual data. There are many text analysis methods are available to managing and visualizing that data, but many techniques may give less accuracy because of the ambiguity of natural language. To provide the ne grained analysis, in this paper introduce e cient machine learning algorithms for categorize text data. To improve the accuracy, in proposed system I introduced Natural language toolkit NLTK python library to perform natural language processing. The main aim of proposed system is to generalize the model for real time text categorization applications by using e cient text classi cation as well as clustering machine learning algorithms and nd the efficient and accurate model for input dataset using performance measure concept. Patil Kiran Sanajy | Prof. Kurhade N. V. ""Experimental Result Analysis of Text Categorization using Clustering and Classification Algorithms"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd25077.pdf
Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/25077/experimental-result-analysis-of-text-categorization-using-clustering-and-classification-algorithms/patil-kiran-sanajy
Case-Based Reasoning for Explaining Probabilistic Machine Learningijcsit
This paper describes a generic framework for explaining the prediction of probabilistic machine learning
algorithms using cases. The framework consists of two components: a similarity metric between cases that
is defined relative to a probability model and an novel case-based approach to justifying the probabilistic
prediction by estimating the prediction error using case-based reasoning. As basis for deriving similarity
metrics, we define similarity in terms of the principle of interchangeability that two cases are considered
similar or identical if two probability distributions, derived from excluding either one or the other case in the
case base, are identical. Lastly, we show the applicability of the proposed approach by deriving a metric for
linear regression, and apply the proposed approach for explaining predictions of the energy performance of
households.
Performance analysis of neural network models for oxazolines and oxazoles der...ijistjournal
Neural networks have been used successfully to a br
oad range of areas such as business, data mining, d
rug
discovery and biology. In medicine, neural network
s have been applied widely in medical diagnosis,
detection and evaluation of new drugs and treatment
cost estimation. In addition, neural networks have
begin practice in data mining strategies for the a
im of prediction, knowledge discovery. This paper
will
present the application of neural networks for the
prediction and analysis of antitubercular activity
of
Oxazolines and Oxazoles derivatives. This study pre
sents techniques based on the development of Single
hidden layer neural network (SHLFFNN), Gradient Des
cent Back propagation neural network (GDBPNN),
Gradient Descent Back propagation with momentum neu
ral network (GDBPMNN), Back propagation with
Weight decay neural network (BPWDNN) and Quantile r
egression neural network (QRNN) of artificial
neural network (ANN) models Here, we comparatively
evaluate the performance of five neural network
techniques. The evaluation of the efficiency of eac
h model by ways of benchmark experiments is an
accepted application. Cross-validation and resampli
ng techniques are commonly used to derive point
estimates of the performances which are compared to
identify methods with good properties. Predictiv
e
accuracy was evaluated using the root mean squared
error (RMSE), Coefficient determination(
), mean
absolute error(MAE), mean percentage error(MPE) and
relative square error(RSE). We found that all five
neural network models were able to produce feasible
models. QRNN model is outperforms with all
statistical tests amongst other four models.
Trajectory Data Fuzzy Modeling : Ambulances Management Use Caseijdms
Data captured through mobile devices and sensors represent valuable information for organizations. This
collected information comes in huge volume and usually carry uncertain data. Due to this quality issue
difficulties occur in analyzing the trajectory data warehouse. Moreover, the interpretation of the analysis
can vary depending on the background of the user and this will make it difficult to fulfill the analytical
needs of an enterprise. In this paper, we will show the benefits of fuzzy logic in solving the challenges
related to mobility data by integrating fuzzy concepts into the conceptual and the logical model. We use the
ambulance management use case to illustrate our contributions.
Blended intelligence of FCA with FLC for knowledge representation from cluste...IJECEIAES
Formal concept analysis is the process of data analysis mechanism with emergent attractiveness across various fields such as data mining, robotics, medical, big data and so on. FCA is helpful to generate the new learning ontology based techniques. In medical field, some growing kids are facing the problem of representing their knowledge from their gathered prior data which is in the form of unordered and insufficient clustered data which is not supporting them to take the right decision on right time for solving the uncertainty based questionnaires. In the approach of decision theory, many mathematical replicas such as probability-allocation, crisp set, and fuzzy based set theory were designed to deals with knowledge representation based difficulties along with their characteristic. This paper is proposing new ideological blended approach of FCA with FLC and described with major objectives: primarily the FCA analyzes the data based on relationships between the set of objects of prior-attributes and the set of attributes based prior-data, which the data is framed with data-units implicated composition which are formal statements of idea of human thinking with conversion of significant intelligible explanation. Suitable rules are generated to explore the relationship among the attributes and used the formal concept analysis from these suitable rules to explore better knowledge and most important factors affecting the decision making. Secondly how the FLC derive the fuzzification, rule-construction and defuzzification methods implicated for representing the accurate knowledge for uncertainty based questionnaires. Here the FCA is projected to expand the FCA based conception with help of the objective based item set notions considered as the target which is implicated with the expanded cardinalities along with its weights which is associated through the fuzzy based inference decision rules. This approach is more helpful for medical experts for knowing the range of patient’s memory deficiency also for people whose are facing knowledge explorer deficiency.
Analysis of Opinionated Text for Opinion Miningmlaij
In sentiment analysis, the polarities of the opinions expressed on an object/feature are determined to assess the sentiment of a sentence or document whether it is positive/negative/neutral. Naturally, the object/feature is a noun representation which refers to a product or a component of a product, let’s say, the "lens" in a camera and opinions emanating on it are captured in adjectives, verbs, adverbs and noun words themselves. Apart from such words, other meta-information and diverse effective features are also going to play an important role in influencing the sentiment polarity and contribute significantly to the performance of the system. In this paper, some of the associated information/meta-data are explored and investigated in the sentiment text. Based on the analysis results presented here, there is scope for further assessment and utilization of the meta-information as features in text categorization, ranking text document, identification of spam documents and polarity classification problems.
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...ijcseit
To mine out relevant facts at the time of need from web has been a tenuous task. Research on diverse fields
are fine tuning methodologies toward these goals that extracts the best of information relevant to the users
search query. In the proposed methodology discussed in this paper find ways to ease the search complexity
tackling the severe issues hindering the performance of traditional approaches in use. The proposed
methodology find effective means to find all possible semantic relatable frequent sets with FP Growth
algorithm. The outcome of which is the further source of fuel for Bio inspired Fuzzy PSO to find the optimal
attractive points for the web documents to get clustered meeting the requirement of the search query
without losing the relevance. On the whole the proposed system optimizes the objective function of
minimizing the intra cluster differences and maximizes the inter cluster distances along with retention of all
possible relationships with the search context intact. The major contribution being the system finds all
possible combinations matching the user search transaction and thereby making the system more
meaningful. These relatable sets form the set of particles for Fuzzy Clustering as well as PSO and thus
being unbiased and maintains a innate behaviour for any number of new additions to follow the herd
behaviour’s evaluations reveals the proposed methodology fares well as an optimized and effective
enhancements over the conventional approaches.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Text mining efforts to innovate new, previous unknown or hidden data by automatically extracting
collection of information from various written resources. Applying knowledge detection method to
formless text is known as Knowledge Discovery in Text or Text data mining and also called Text Mining.
Most of the techniques used in Text Mining are found on the statistical study of a term either word or
phrase. There are different algorithms in Text mining are used in the previous method. For example
Single-Link Algorithm and Self-Organizing Mapping(SOM) is introduces an approach for visualizing
high-dimensional data and a very useful tool for processing textual data based on Projection method.
Genetic and Sequential algorithms are provide the capability for multiscale representation of datasets and
fast to compute with less CPU time based on the Isolet Reduces subsets in Unsupervised Feature
Selection. We are going to propose the Vector Space Model and Concept based analysis algorithm it will
improve the text clustering quality and a better text clustering result may achieve. We think it is a good
behavior of the proposed algorithm is in terms of toughness and constancy with respect to the formation of
Neural Network.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Performance analysis of linkage learning techniques in genetic algorithmseSAT Journals
Abstract One variance of Genetic Algorithms is a Linkage Learning Genetic Algorithm (LLGA) enhances the efficiencies of Simple Genetic Algorithm (SGA) while solving NP hard Problems. Discovery of Linkage Learning Technique is an important task in GA. Almost all existing Linkage Learning Techniques follow either random approach or probabilistic approaches. This makes repeated passes over the population to determine the relationship between individuals. SGA with random linkage technique is simple but may take long time to converge to the optimal solutions. This paper uses a linkage learning operator called Gene Silencing which is an inspired mechanism from biological systems. The Gene Silencing mechanism is used to improve the linkages by preserving the building blocks in an individual from the disruption of recombination processes such as Crossover and Mutation. It converges quickly to the optimal solution without compromising the diversification on search spaces. To prove this phenomenon, the Travelling Sales Person problem (TSP) has been chosen to retain the order of cities in a tour. Experiments carried out on different TSP benchmark instances taken from TSPLIB which is a standard library for TSP problems. These benchmark instances have also been applied on various linkage learning techniques and analyses the performance of these techniques with Gene Silencing (GS) mechanism. The performance analysis has been made on experimental results with respect to optimal solution and convergence speed. Index Terms: Linkage Learning, Gene Silencing, Building Blocks, Genetic Algorithm, TSPLIB, Performance Analysis
The classical or traditional information system provides answer after a user submits a complete query. It is even
noticed that presently, almost all the relational database systems rely on the query which has syntax and semantics
defined completely to access data. But often it is the case that we are willing to use vague terms in our query. The main
objective of database management system is to provide an environment that is both convenient and efficient for people
to use in storing and retrieving information. A recent trend of supporting auto complete is a first step to cope up with
this problem. We can have design of both classical and fuzzy database and can use effectively fuzzy queries on these
databases. Fuzzy databases are developed to manipulate the incomplete, unclear and vague data such as low, fast, very
high, about etc. The primary focus of fuzzy logic is on the natural language. This Paper provides the users the flexibility
or freedom to query database using natural language. Here this paper implements “interactive fuzzy search”. This
framework for interactive fuzzy search permits the user to explore the data as they type even in the presence of some
minor errors. This paper applies fuzzy queries on relational database so that it is possible to have the precise result as
well as the output for the uncertain terms we generally use based on some membership function
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Amit Sheth
Literature-Based Discovery (LBD) refers to the process of uncovering hidden connections that are implicit in scientific literature. Numerous hypotheses have been generated from scientific literature, which influenced innovations in diagnosis, treatment, preventions and overall public health. However, much of the existing research on discovering hidden connections among concepts have used distributional statistics and graph-theoretic measures to capture implicit associations. Such metrics do not explicitly capture the semantics of hidden connections. ...
While effective in some situations, the practice of relying on domain expertise, structured background knowledge and heuristics to complement distributional and graph-theoretic approaches, has serious limitations. ..
This dissertation proposes an innovative context-driven, automatic subgraph creation method for finding hidden and complex associations among concepts, along multiple thematic dimensions. It outlines definitions for context and shared context, based on implicit and explicit (or formal) semantics, which compensate for deficiencies in statistical and graph-based metrics. It also eliminates the need for heuristics a priori. An evidence-based evaluation of the proposed framework showed that 8 out of 9 existing scientific discoveries could be recovered using this approach. Additionally, insights into the meaning of associations could be obtained using provenance provided by the system. In a statistical evaluation to determine the interestingness of the generated subgraphs, it was observed that an arbitrary association is mentioned in only approximately 4 articles in MEDLINE, on average. These results suggest that leveraging implicit and explicit context, as defined in this dissertation, is an advancement of the state-of-the-art in LBD research.
Ph.D. Committee: Drs. Amit Sheth (Advisor), TK Prasad, Michael Raymer,
Ramakanth Kavuluru (UKY), Thomas C. Rindflesch (NLM) and Varun Bhagwan (Yahoo! Labs)
Relevant Publications (more at: http://knoesis.wright.edu/students/delroy/)
D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Leveraging Distributional Semantics for Domain Agnostic Literature-Based Discovery (under preparation)
D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13), 46(2): 238–251, 2013
D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11), pp. 512–519, 2011 (acceptance rate=19.4%)
D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10), 14, 2010
International Journal of Computer Science, Engineering and Information Techno...IJCSEIT Journal
In the field of proteomics because of more data is added, the computational methods need to be more
efficient. The part of molecular sequences is functionally more important to the molecule which is more
resistant to change. To ensure the reliability of sequence alignment, comparative approaches are used. The
problem of multiple sequence alignment is a proposition of evolutionary history. For each column in the
alignment, the explicit homologous correspondence of each individual sequence position is established. The
different pair-wise sequence alignment methods are elaborated in the present work. But these methods are
only used for aligning the limited number of sequences having small sequence length. For aligning
sequences based on the local alignment with consensus sequences, a new method is introduced. From NCBI
databank triticum wheat varieties are loaded. Phylogenetic trees are constructed for divided parts of
dataset. A single new tree is constructed from previous generated trees using advanced pruning technique.
Then, the closely related sequences are extracted by applying threshold conditions and by using shift
operations in the both directions optimal sequence alignment is obtained.
Ontologies are being used to organize information in many domains like artificial intelligence,
information science, semantic web, library science. Ontologies of an entity having different information
can be merged to create more knowledge of that particular entity. Ontologies today are powering more
accurate search and retrieval in websites like Wikipedia etc. As we move towards the future to Web 3.0,
also termed as the semantic web, ontologies will play a more important role.
Ontologies are represented in various forms like RDF, RDFS, XML, OWL etc. Querying ontologies can
yield basic information about an entity. This paper proposes an automated method for ontology creation,
using concepts from NLP (Natural Language Processing), Information Retrieval and Machine Learning.
Concepts drawn from these domains help in designing more accurate ontologies represented using the
XML format. This paper uses document classification using classification algorithms for assigning labels
to documents, document similarity to cluster similar documents to the input document, together, and
summarization to shorten the text and keep important terms essential in making the ontology. The module
is constructed using the Python programming language and NLTK (Natural Language Toolkit). The
ontologies created in XML will convey to a lay person the definition of the important term's and their
lexical relationships.
This paper proposes Natural language based Discourse Analysis method used for extracting
information from the news article of different domain. The Discourse analysis used the Rhetorical Structure
theory which is used to find coherent group of text which are most prominent for extracting information
from text. RST theory used the Nucleus- Satellite concept for finding most prominent text from the text
document. After Discourse analysis the text analysis has been done for extracting domain related object
and relates this object. For extracting the information knowledge based system has been used which
consist of domain dictionary .The domain dictionary has a bag of words for domain. The system is
evaluated according gold-of-art analysis and human decision for extracted information.
First Do No Harm: Ethics and Online RepresentationBryan Nunez
Where does one find the balance between openness and privacy when dealing with online visual media? As the ability to capture, post, and re-mix images and video becomes increasingly common, people that advocate for the rights of at-risk populations need to weigh the risks involved with the need to tell the stories of those affected.
In this session we will review examples of online media used to expose instances of human rights abuse, political repression, and discrimination. We will examine online media's ability to focus a potentially global audience on an issue as well as the dangers to those both behind and in front of the camera. We will also discuss the tools and technologies for creating and distributing this media. Video is being reworked, remixed and recirculated by many more people. New possibilities for action by a global citizenry have arisen, but these carry with them real dangers. Confronting these challenges will require the collaboration of the people on the front lines as well as the those who create an maintain the technologies used.
Analysis of Opinionated Text for Opinion Miningmlaij
In sentiment analysis, the polarities of the opinions expressed on an object/feature are determined to assess the sentiment of a sentence or document whether it is positive/negative/neutral. Naturally, the object/feature is a noun representation which refers to a product or a component of a product, let’s say, the "lens" in a camera and opinions emanating on it are captured in adjectives, verbs, adverbs and noun words themselves. Apart from such words, other meta-information and diverse effective features are also going to play an important role in influencing the sentiment polarity and contribute significantly to the performance of the system. In this paper, some of the associated information/meta-data are explored and investigated in the sentiment text. Based on the analysis results presented here, there is scope for further assessment and utilization of the meta-information as features in text categorization, ranking text document, identification of spam documents and polarity classification problems.
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...ijcseit
To mine out relevant facts at the time of need from web has been a tenuous task. Research on diverse fields
are fine tuning methodologies toward these goals that extracts the best of information relevant to the users
search query. In the proposed methodology discussed in this paper find ways to ease the search complexity
tackling the severe issues hindering the performance of traditional approaches in use. The proposed
methodology find effective means to find all possible semantic relatable frequent sets with FP Growth
algorithm. The outcome of which is the further source of fuel for Bio inspired Fuzzy PSO to find the optimal
attractive points for the web documents to get clustered meeting the requirement of the search query
without losing the relevance. On the whole the proposed system optimizes the objective function of
minimizing the intra cluster differences and maximizes the inter cluster distances along with retention of all
possible relationships with the search context intact. The major contribution being the system finds all
possible combinations matching the user search transaction and thereby making the system more
meaningful. These relatable sets form the set of particles for Fuzzy Clustering as well as PSO and thus
being unbiased and maintains a innate behaviour for any number of new additions to follow the herd
behaviour’s evaluations reveals the proposed methodology fares well as an optimized and effective
enhancements over the conventional approaches.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Text mining efforts to innovate new, previous unknown or hidden data by automatically extracting
collection of information from various written resources. Applying knowledge detection method to
formless text is known as Knowledge Discovery in Text or Text data mining and also called Text Mining.
Most of the techniques used in Text Mining are found on the statistical study of a term either word or
phrase. There are different algorithms in Text mining are used in the previous method. For example
Single-Link Algorithm and Self-Organizing Mapping(SOM) is introduces an approach for visualizing
high-dimensional data and a very useful tool for processing textual data based on Projection method.
Genetic and Sequential algorithms are provide the capability for multiscale representation of datasets and
fast to compute with less CPU time based on the Isolet Reduces subsets in Unsupervised Feature
Selection. We are going to propose the Vector Space Model and Concept based analysis algorithm it will
improve the text clustering quality and a better text clustering result may achieve. We think it is a good
behavior of the proposed algorithm is in terms of toughness and constancy with respect to the formation of
Neural Network.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Performance analysis of linkage learning techniques in genetic algorithmseSAT Journals
Abstract One variance of Genetic Algorithms is a Linkage Learning Genetic Algorithm (LLGA) enhances the efficiencies of Simple Genetic Algorithm (SGA) while solving NP hard Problems. Discovery of Linkage Learning Technique is an important task in GA. Almost all existing Linkage Learning Techniques follow either random approach or probabilistic approaches. This makes repeated passes over the population to determine the relationship between individuals. SGA with random linkage technique is simple but may take long time to converge to the optimal solutions. This paper uses a linkage learning operator called Gene Silencing which is an inspired mechanism from biological systems. The Gene Silencing mechanism is used to improve the linkages by preserving the building blocks in an individual from the disruption of recombination processes such as Crossover and Mutation. It converges quickly to the optimal solution without compromising the diversification on search spaces. To prove this phenomenon, the Travelling Sales Person problem (TSP) has been chosen to retain the order of cities in a tour. Experiments carried out on different TSP benchmark instances taken from TSPLIB which is a standard library for TSP problems. These benchmark instances have also been applied on various linkage learning techniques and analyses the performance of these techniques with Gene Silencing (GS) mechanism. The performance analysis has been made on experimental results with respect to optimal solution and convergence speed. Index Terms: Linkage Learning, Gene Silencing, Building Blocks, Genetic Algorithm, TSPLIB, Performance Analysis
The classical or traditional information system provides answer after a user submits a complete query. It is even
noticed that presently, almost all the relational database systems rely on the query which has syntax and semantics
defined completely to access data. But often it is the case that we are willing to use vague terms in our query. The main
objective of database management system is to provide an environment that is both convenient and efficient for people
to use in storing and retrieving information. A recent trend of supporting auto complete is a first step to cope up with
this problem. We can have design of both classical and fuzzy database and can use effectively fuzzy queries on these
databases. Fuzzy databases are developed to manipulate the incomplete, unclear and vague data such as low, fast, very
high, about etc. The primary focus of fuzzy logic is on the natural language. This Paper provides the users the flexibility
or freedom to query database using natural language. Here this paper implements “interactive fuzzy search”. This
framework for interactive fuzzy search permits the user to explore the data as they type even in the presence of some
minor errors. This paper applies fuzzy queries on relational database so that it is possible to have the precise result as
well as the output for the uncertain terms we generally use based on some membership function
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Amit Sheth
Literature-Based Discovery (LBD) refers to the process of uncovering hidden connections that are implicit in scientific literature. Numerous hypotheses have been generated from scientific literature, which influenced innovations in diagnosis, treatment, preventions and overall public health. However, much of the existing research on discovering hidden connections among concepts have used distributional statistics and graph-theoretic measures to capture implicit associations. Such metrics do not explicitly capture the semantics of hidden connections. ...
While effective in some situations, the practice of relying on domain expertise, structured background knowledge and heuristics to complement distributional and graph-theoretic approaches, has serious limitations. ..
This dissertation proposes an innovative context-driven, automatic subgraph creation method for finding hidden and complex associations among concepts, along multiple thematic dimensions. It outlines definitions for context and shared context, based on implicit and explicit (or formal) semantics, which compensate for deficiencies in statistical and graph-based metrics. It also eliminates the need for heuristics a priori. An evidence-based evaluation of the proposed framework showed that 8 out of 9 existing scientific discoveries could be recovered using this approach. Additionally, insights into the meaning of associations could be obtained using provenance provided by the system. In a statistical evaluation to determine the interestingness of the generated subgraphs, it was observed that an arbitrary association is mentioned in only approximately 4 articles in MEDLINE, on average. These results suggest that leveraging implicit and explicit context, as defined in this dissertation, is an advancement of the state-of-the-art in LBD research.
Ph.D. Committee: Drs. Amit Sheth (Advisor), TK Prasad, Michael Raymer,
Ramakanth Kavuluru (UKY), Thomas C. Rindflesch (NLM) and Varun Bhagwan (Yahoo! Labs)
Relevant Publications (more at: http://knoesis.wright.edu/students/delroy/)
D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Leveraging Distributional Semantics for Domain Agnostic Literature-Based Discovery (under preparation)
D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13), 46(2): 238–251, 2013
D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11), pp. 512–519, 2011 (acceptance rate=19.4%)
D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10), 14, 2010
International Journal of Computer Science, Engineering and Information Techno...IJCSEIT Journal
In the field of proteomics because of more data is added, the computational methods need to be more
efficient. The part of molecular sequences is functionally more important to the molecule which is more
resistant to change. To ensure the reliability of sequence alignment, comparative approaches are used. The
problem of multiple sequence alignment is a proposition of evolutionary history. For each column in the
alignment, the explicit homologous correspondence of each individual sequence position is established. The
different pair-wise sequence alignment methods are elaborated in the present work. But these methods are
only used for aligning the limited number of sequences having small sequence length. For aligning
sequences based on the local alignment with consensus sequences, a new method is introduced. From NCBI
databank triticum wheat varieties are loaded. Phylogenetic trees are constructed for divided parts of
dataset. A single new tree is constructed from previous generated trees using advanced pruning technique.
Then, the closely related sequences are extracted by applying threshold conditions and by using shift
operations in the both directions optimal sequence alignment is obtained.
Ontologies are being used to organize information in many domains like artificial intelligence,
information science, semantic web, library science. Ontologies of an entity having different information
can be merged to create more knowledge of that particular entity. Ontologies today are powering more
accurate search and retrieval in websites like Wikipedia etc. As we move towards the future to Web 3.0,
also termed as the semantic web, ontologies will play a more important role.
Ontologies are represented in various forms like RDF, RDFS, XML, OWL etc. Querying ontologies can
yield basic information about an entity. This paper proposes an automated method for ontology creation,
using concepts from NLP (Natural Language Processing), Information Retrieval and Machine Learning.
Concepts drawn from these domains help in designing more accurate ontologies represented using the
XML format. This paper uses document classification using classification algorithms for assigning labels
to documents, document similarity to cluster similar documents to the input document, together, and
summarization to shorten the text and keep important terms essential in making the ontology. The module
is constructed using the Python programming language and NLTK (Natural Language Toolkit). The
ontologies created in XML will convey to a lay person the definition of the important term's and their
lexical relationships.
This paper proposes Natural language based Discourse Analysis method used for extracting
information from the news article of different domain. The Discourse analysis used the Rhetorical Structure
theory which is used to find coherent group of text which are most prominent for extracting information
from text. RST theory used the Nucleus- Satellite concept for finding most prominent text from the text
document. After Discourse analysis the text analysis has been done for extracting domain related object
and relates this object. For extracting the information knowledge based system has been used which
consist of domain dictionary .The domain dictionary has a bag of words for domain. The system is
evaluated according gold-of-art analysis and human decision for extracted information.
First Do No Harm: Ethics and Online RepresentationBryan Nunez
Where does one find the balance between openness and privacy when dealing with online visual media? As the ability to capture, post, and re-mix images and video becomes increasingly common, people that advocate for the rights of at-risk populations need to weigh the risks involved with the need to tell the stories of those affected.
In this session we will review examples of online media used to expose instances of human rights abuse, political repression, and discrimination. We will examine online media's ability to focus a potentially global audience on an issue as well as the dangers to those both behind and in front of the camera. We will also discuss the tools and technologies for creating and distributing this media. Video is being reworked, remixed and recirculated by many more people. New possibilities for action by a global citizenry have arisen, but these carry with them real dangers. Confronting these challenges will require the collaboration of the people on the front lines as well as the those who create an maintain the technologies used.
Deloitte Belgium wants to cross-sell their M&A (Merger & Acquisition) division capabilities.
Since smart phones are widely used within Deloitte, the client requested to design and create a mobile brochure instead of a more traditional printed brochure.
Intersection between the activities of two regulators – shall prior actions t...Michal
The commented judgment of the Polish Supreme Court concerns Telekomunikacja
Polska S.A. (hereafter, TPSA)1 and the fines imposed upon the incumbent operator
by the President of the Office of Competition and Consumer Protection (in Polish:
Urząd Ochrony Konkurencji i Konsumentów; hereafter, UOKiK) for the abuse of its
dominant position. TPSA is a Polish telecoms provider formally established in 1991.
It is a public company – its shares are traded on the Warsaw Stock Exchange with the
controlling stake owned by France Télécom2. TPSA is often the subject of competition
law decisions issued not only by the UOKiK President but also by the European
Commission, particularly with respect to dominant position abuse
A 15 minute introduction to the #WVUCommMOOC, narrated by Dr. Nick Bowman. NOTE: This Powerpoint is accompanied by a narrative track, so you will need to download the presentation to your device in order to play the narration.
Here, we cover the basics of MOOCing, we preview our upcoming MOOC for February 2013, and we offer a few tips on successfully using a MOOC.
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...European Data Forum
Selected Talk of Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universidad Politecnica de Madrid, Spain at the European Data Forum 2014, 19 March 2014 in Athens, Greece: 3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Sarvesh Kumar
The work is carried on the application of differential equation (DE) and its computational technique of genetic algorithm and neural (GANN) in C#, which is frequently used in globalised world by human wings. Diagrammatical and flow chart presentation is the major concerned for easy undertaking of these two concepts with indication of its present and future application is the new initiative taken in this paper along with computational approaches in C#. Little observation has been also pointed during working, functioning and development process of above algorithm in C# under given boundary value condition of DE for genetic and neural. Operations of fitness function and Genetic operations were completed for behavioural transmission of chromosome.
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...journal ijrtem
process in which instead comparing whole query sequence with database sequence it breaks
query sequence into small words and these words are used to align patterns. it uses heuristic method which
make it faster than earlier smith-waterman algorithm. But due small query sequence used for align in case of
very large database with complex queries it may perform poor. To remove this draw back we suggest by using
MSA tools which can filter database in by removing unnecessary sequences from data. This sorted data set then
applies to BLAST which can then indentify relationship among them i.e. HOMOLOGS, ORTHOLOGS,
PARALOGS. The proposed system can be further use to find relation among two persons or used to create
family tree. Ortholog is interesting for a wide range of bioinformatics analyses, including functional annotation,
phylogenetic inference, or genome evolution. This system describes and motivates the algorithm for predicting
orthologous relationships among complete genomes. The algorithm takes a pairwise approach, thus neither
requiring tree reconstruction nor reconciliation
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...IJRTEMJOURNAL
BLAST is most popular sequence alignment tool used to align bioinformatics patterns. It uses
local alignment process in which instead comparing whole query sequence with database sequence it breaks
query sequence into small words and these words are used to align patterns. it uses heuristic method which
make it faster than earlier smith-waterman algorithm. But due small query sequence used for align in case of
very large database with complex queries it may perform poor. To remove this draw back we suggest by using
MSA tools which can filter database in by removing unnecessary sequences from data. This sorted data set then
applies to BLAST which can then indentify relationship among them i.e. HOMOLOGS, ORTHOLOGS,
PARALOGS. The proposed system can be further use to find relation among two persons or used to create
family tree. Ortholog is interesting for a wide range of bioinformatics analyses, including functional annotation,
phylogenetic inference, or genome evolution. This system describes and motivates the algorithm for predicting
orthologous relationships among complete genomes. The algorithm takes a pairwise approach, thus neither
requiring tree reconstruction nor reconciliation
This paper presents a literature survey conducted for research oriented developments made till. The significance of this paper would be to provide a deep rooted understanding and knowledge transfer regarding existing approaches for gene sequencing and alignments using Smith Waterman algorithms and their respective strengths and weaknesses. In order to develop or perform any quality research it is always advised to conduct research goal oriented literature survey that could facilitate an in depth understanding of research work and an objective can be formulated on the basis of gaps existing between present requirements and existing approaches. Gene sequencing problems are one of the predominant issues for researchers to come up with optimized system model that could facilitate optimum processing and efficiency without introducing overheads in terms of memory and time. This research is oriented towards developing such kind of system while taking into consideration of dynamic programming approach called Smith Waterman algorithm in its enhanced form decorated with other supporting and optimized techniques. This paper provides an introduction oriented knowledge transfer so as to provide a brief introduction of research domain, research gap and motivations, objective formulated and proposed systems to accomplish ultimate objectives.
Application of Hybrid Genetic Algorithm Using Artificial Neural Network in Da...IOSRjournaljce
The main purpose of data mining is to extract knowledge from large amount of data. Artificial Neural network (ANN) has already been applied in a variety of domains with remarkable success. This paper presents the application of hybrid model for stroke disease that integrates Genetic algorithm and back propagation algorithm. Selecting a good subset of features, without sacrificing accuracy, is of great importance for neural networks to be successfully applied to the area. In addition the hybrid model that leads to further improvised categorization, accuracy compared to the result produced by genetic algorithm alone. In this study, a new hybrid model of Neural Networks and Genetic Algorithm (GA) to initialize and optimize the connection weights of ANN so as to improve the performance of the ANN and the same has been applied in a medical problem of predicting stroke disease for verification of the results.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Comparative analysis of dynamic programming algorithms to find similarity in ...eSAT Journals
Abstract There exist many computational methods for finding similarity in gene sequence, finding suitable methods that gives optimal similarity is difficult task. Objective of this project is to find an appropriate method to compute similarity in gene/protein sequence, both within the families and across the families. Many dynamic programming algorithms like Levenshtein edit distance; Longest Common Subsequence and Smith-waterman have used dynamic programming approach to find similarities between two sequences. But none of the method mentioned above have used real benchmark data sets. They have only used dynamic programming algorithms for synthetic data. We proposed a new method to compute similarity. The performance of the proposed algorithm is evaluated using number of data sets from various families, and similarity value is calculated both within the family and across the families. A comparative analysis and time complexity of the proposed method reveal that Smith-waterman approach is appropriate method when gene/protein sequence belongs to same family and Longest Common Subsequence is best suited when sequence belong to two different families. Keywords - Bioinformatics, Gene, Gene Sequencing, Edit distance, String Similarity.
This proposed method focus on these issues by developing a novel classification algorithm by combining Gene Expression Graph (GEG) with Manhattan distance. This method will be used to express the gene expression data. Gene Expression Graph provides the optimal view about the relationship between normal and unhealthy genes. The method of using a graph-based gene expression to express gene information was first offered by the authors in [1] and [2], It will permits to construct a classifier based on an association between graphs represented for well-known classes and graphs represented for samples to evaluate. Additionally Euclidean distance is used to measure the strength of relationship which exists between the genes.
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification? IJORCS
An algorithm for locating all occurrences of a finite number of keywords in an arbitrary string, also known as multiple strings matching, is commonly required in information retrieval (such as sequence analysis, evolutionary biological studies, gene/protein identification and network intrusion detection) and text editing applications. Although Aho-Corasick was one of the commonly used exact multiple strings matching algorithm, Commentz-Walter has been introduced as a better alternative in the recent past. Comments-Walter algorithm combines ideas from both Aho-Corasick and Boyer Moore. Large scale rapid and accurate peptide identification is critical in computational proteomics. In this paper, we have critically analyzed the time complexity of Aho-Corasick and Commentz-Walter for their suitability in large scale peptide identification. According to the results we obtained for our dataset, we conclude that Aho-Corasick is performing better than Commentz-Walter as opposed to the common beliefs.
Pattern Recognition using Artificial Neural NetworkEditor IJCATR
An artificial neural network (ANN) usually called neural network. It can be considered as a resemblance to a paradigm
which is inspired by biological nervous system. In network the signals are transmitted by the means of connections links. The links
possess an associated way which is multiplied along with the incoming signal. The output signal is obtained by applying activation to
the net input NN are one of the most exciting and challenging research areas. As ANN mature into commercial systems, they are likely
to be implemented in hardware. Their fault tolerance and reliability are therefore vital to the functioning of the system in which they
are embedded. The pattern recognition system is implemented with Back propagation network and Hopfield network to remove the
distortion from the input. The Hopfield network has high fault tolerance which supports this system to get the accurate output.
PERFORMANCE ANALYSIS OF HYBRID FORECASTING MODEL IN STOCK MARKET FORECASTINGIJMIT JOURNAL
This paper presents performance analysis of hybrid model comprise of concordance and Genetic
Programming (GP) to forecast financial market with some existing models. This scheme can be used for in
depth analysis of stock market. Different measures of concordances such as Kendall’s Tau, Gini’s Mean
Difference, Spearman’s Rho, and weak interpretation of concordance are used to search for the pattern in
past that look similar to present. Genetic Programming is then used to match the past trend to present
trend as close as possible. Then Genetic Program estimates what will happen next based on what had
happened next. The concept is validated using financial time series data (S&P 500 and NASDAQ indices)
as sample data sets. The forecasted result is then compared with standard ARIMA model and other model
to analyse its performance
Bioinformatics may be defined as the field of science
in which biology, computer science, and information
technology merge to form a single discipline. Its ultimate
goal is to enable the discovery of new biological insights as
well as to create a global perspective from which unifying
principles in biology can be discerned by means of
bioinformatics tools for storing, retrieving, organizing and
analyzing biological data. Also most of these tools possess
very distinct features and capabilities making a direct
comparison difficult to be done. In this paper we propose
taxonomy for characterizing bioinformatics tools and briefly
surveys major bioinformatics tools under each categories.
Hopefully this study will stimulate other designers
and
experienced end users understand the details of particular
tool categories/tools, enabling them to make the best choices
for their particular research interests.
Criminal and Civil Identification with DNA Databases Using Bayesian NetworksCSCJournals
Forensic identification problems are examples in which the study of DNA profiles is a common approach. To deal with these problems an introduction and explanation of various concepts is needed. Here we present some problems and develop their treatment putting the focus in the use of object-oriented Bayesian networks. The use of DNA databases, which began in 1995 in England, has created new challenges about its use. In Portugal the legislation for the construction of a genetic database was defined in 2008. With this it is important to determine how to use it in an appropriate way. For a crime that has been committed, forensic laboratories identify genetic characteristics in order to connect one or more individuals to it. Apart the laboratories results it is a matter of great importance to quantify the information obtained, i.e., to know how to evaluate and interpret the results obtained providing support to the judicial system. Other forensic identification problems are body identification; whether the identification of a body (or more than one) found, together with the information of missing persons belonging to one or more known families, for which there may be information of family members who claimed the disappearance. In this work intend to discuss how to use the database; the hypotheses of interest and the database use to determine the likelihood ratios, i.e., how to evaluate the evidence for different situations. Keywords: Bayesian networks, DNA profiles, identification problems.
GRAPH ALGORITHM TO FIND CORE PERIPHERY STRUCTURES USING MUTUAL K-NEAREST NEIG...ijaia
Core periphery structures exist naturally in many complex networks in the real-world like social,
economic, biological and metabolic networks. Most of the existing research efforts focus on the
identification of a meso scale structure called community structure. Core periphery structures are another
equally important meso scale property in a graph that can help to gain deeper insights about the
relationships between different nodes. In this paper, we provide a definition of core periphery structures
suitable for weighted graphs. We further score and categorize these relationships into different types based
upon the density difference between the core and periphery nodes. Next, we propose an algorithm called
CP-MKNN (Core Periphery-Mutual K Nearest Neighbors) to extract core periphery structures from
weighted graphs using a heuristic node affinity measure called Mutual K-nearest neighbors (MKNN).
Using synthetic and real-world social and biological networks, we illustrate the effectiveness of developed
core periphery structures.
Graph Algorithm to Find Core Periphery Structures using Mutual K-nearest Neig...gerogepatton
Core periphery structures exist naturally in many complex networks in the real-world like social,
economic, biological and metabolic networks. Most of the existing research efforts focus on the
identification of a meso scale structure called community structure. Core periphery structures are another
equally important meso scale property in a graph that can help to gain deeper insights about the
relationships between different nodes. In this paper, we provide a definition of core periphery structures
suitable for weighted graphs. We further score and categorize these relationships into different types based
upon the density difference between the core and periphery nodes. Next, we propose an algorithm called
CP-MKNN (Core Periphery-Mutual K Nearest Neighbors) to extract core periphery structures from
weighted graphs using a heuristic node affinity measure called Mutual K-nearest neighbors (MKNN).
Using synthetic and real-world social and biological networks, we illustrate the effectiveness of developed
core periphery structures.
Graph Algorithm to Find Core Periphery Structures using Mutual K-nearest Neig...gerogepatton
Core periphery structures exist naturally in many complex networks in the real-world like social, economic, biological and metabolic networks. Most of the existing research efforts focus on the identification of a meso scale structure called community structure. Core periphery structures are another equally important meso scale property in a graph that can help to gain deeper insights about the relationships between different nodes. In this paper, we provide a definition of core periphery structures suitable for weighted graphs. We further score and categorize these relationships into different types based upon the density difference between the core and periphery nodes. Next, we propose an algorithm called CP-MKNN (Core Periphery-Mutual K Nearest Neighbors) to extract core periphery structures from weighted graphs using a heuristic node affinity measure called Mutual K-nearest neighbors (MKNN). Using synthetic and real-world social and biological networks, we illustrate the effectiveness of developed core periphery structures.
1. Prakash Kumar Sarangi et al. / International Journal of Engineering Science and Technology (IJEST)
A Compression- Based Technique for
Comparing Stock Market Patterns Be-
havior with Human Genome
Prakash Kumar Sarangi
Department of Information Technology
NM Institute of Engineering and Technology, Bhubaneswar, India.
Prakash_sarangi89@yahoo.com.
Birendra Kumar Nayak
Department of Mathematic,
Utkal University, Bhubaneswar, India.
bknatuu@yahoo.co.uk.
Sachidananda Dehuri
Department of Information & Communication
Fakir Mohan University, Balasore , India.
Satchi.lapa@gmail.com.
Abstract
In this paper we propose a new approach for comparing stock market patterns behavior with DNA
sequences using a compression based technique. The behavior of the stock market is put into sequence of 0’s
and 1’s which was converted to the sequence of nucleotides A, T, C, G. Here we propose a compression
algorithm “Huffman Coding “for DNA sequences based on a novel algorithm of assigning binary bit codes (0
and 1) for each base(A,C,G,T) to compress DNA sequence. This sequence so obtained is matched to the text
DNA sequence by using BLAST. It is found that the certain sub-sequences of the sequence match cent percent
with sub-sequences of Homo sapiens chromosome 19. Possibility using this approach to predict the stock
market behavior is explored.
Keywords: Huffman codes, Subsequence, Nucleotides, Local alignment.
1. INTRODUCTION
For many years the following question has been a source of continuing controversy in both academic and
business circles: To what extent can the past history of a common stock's price be used to make meaningful
predictions concerning the future price of the stock? Answers to this question have been provided on the one
hand by the various chartist theories and on the other hand by the theory of random walks. Although there are
many different chartist theories, they all make the same basic assumption. That is, they all assume that the past
behavior of a security's price is rich in information concerning its future behavior. History repeats itself in that
"patterns" of past price behavior will tend to recur in the future. Thus, if through careful analysis of price charts
one develops an understanding of these "patterns," this can be used to predict the future behavior of prices and
in this way increase expected gains [1].
Due to its randomness, lots of research has going on for studying behavior of stock market. In 1959,
Roberts wrote” If the stock market behaved like a mechanically imperfect roulette wheel, people would notice
the imperfections and, by acting on them, remove them. This rationale is appealing, if for no other reason than
its value as counterweight to the popular view of stock market “irrationality,” but it is obviously incomplete.
Roberts generated a series of random numbers and plot result to see whether any patterns that were known to
technical analysts would be visible [2].
In another way many work has been done using intelligent computational technique like Hidden Markov
Model (HMM), Artificial Neural Networks (ANN) and Genetic Algorithms (GA) to forecast financial market
behavior [3-5]. But in this paper we are putting a relationship between behaviors of stock market with DNA
sequences. For above analysis we need support of Bio-informatics tools like BLAST.
The primary goal of bioinformatics is to increase the understanding of biological processes. Bioinformatics
[6], the application of computational techniques to analyze the information associated with bi-molecules on a
large scale, has now firmly established itself as a discipline in molecular biology. Bioinformatics is a
management information system for molecular biology. Bioinformatics encompasses everything from data
storage and retrieval to the identification and presentation of features within data, such as finding genes within
ISSN : 0975-5462 Vol. 4 No.01 January 2012 144
2. Prakash Kumar Sarangi et al. / International Journal of Engineering Science and Technology (IJEST)
DNA sequence, finding similarities between sequences, structural predictions [7]. Using such BLAST software
we generate a relationship between them. Also we can predict future aspects of stock market.
The main motivation of this paper is to propose a comparison between DNA sequences with stock market
tendency and to test the predictability of the proposed BLAST software. The rest of the study is organized as
follows. The next section will describe the methodology which processing in detail like mapping, encoding,
partitioning, and matching. In Section 3, we give an experiment scheme and Empirical results and analysis are
reported in this section. The concluding remarks are given in Section 4. In last Section we propose future work
for prediction of stock market.
2. METHODOLOGY
In this section, the closing price of day to day trading builds a process is presented in detail mapping with
human genome. First of all, a mapping of stock price to binary is described. Then encode these binary values to
nucleotides using Huffman tree to compress total data to half. Nucleotides are divided in to several DNA
sequences having a continuous distribution. Each part matched using BLAST, which finally give a result as
establish a relation between human genome with stock market behavior.
2.1. Representation of Stock behavior to Binary
This study is to map and explore the tendency of stock price index. The research data used in this study is
technical indicators and the direction of change in the daily S&P500 stock price index thirty years. Considering
their closing price of each day, they are categorized as “0” and “1” in the research data. “0” means that the next
day’s index is lower or same to today’s index, and “1” means that the next day’s index is higher than today’s
index [8].
2.2. Proposed Huffman Algorithm for Encoding
DNA, or deoxyribonucleic acid, is the hereditary material in humans and almost all other organisms. An
important property of DNA is that it can replicate, or make copies of itself. Each strand of DNA in the double
helix can serve as a pattern for duplicating the sequence of bases. This is critical when cells divide because each
new cell needs to have an exact copy of the DNA present in the old cell [9]. The information in DNA is stored
as a code made up of four chemical bases: adenine (A), guanine (G), cytosine (C), and thymine (T). DNA bases
pair up with each other, A with T and C with G, to form units called base pairs.
The specific bit sequence assigned to an individual base is determined by tracing out the path from the root
of the Huffman tree to the leaf that represents that Base. By convention, bit '0' represents following the left child
when tracing out the path and bit '1' represents following the right child when tracing out the path. The Huffman
tree is constructed such that the paths from the root to the most frequently used bases are short while the paths to
less frequently used bases are long. This results in short codes for frequently used bases and long codes for less
frequently used bases. Now here we found that all are having same length so it is consider as fixed length
encoding process. If the sequences are encoded using the trivial method of assigning 2 bits per base. Given a
DNA sequence consisting of A, C, T, G characters, we use two bits to encode each character:
“00” for A,
“01” for C,
“11” for T, and
“10” for G.
As a result, each cell of 2 bits representing one DNA character. As we know in DNA chaining “A” bonding
with “T” and “ C” bonding with “G” , we generate a rule which pairs are 1’s complement with each other [10].
For an example, DNA sequence TACCTGCGCTA is encoded by binary sequence 11 00 01 01 11 10 01 10 01
11 00.
2.3. Building of DNA Patterns
Entire compressed DNA pattern of thirty years are divided into six parts; each part corresponds to five years of
stock market behavior. Since each parts are a form of DNA sequence, we can say each part have a life. Now we
compare with a very large DNA database to identify this pattern behavior align with any species. There are
many pattern matching algorithms to match pattern of any data, but in bioinformatics we have very user friendly
tool BLAST, which have potential to find different patterns.
2.4. Multiple Sequence Alignment using BLAST
BLAST (Basic Local Alignment Search Tool): family of sequence alignment algorithms developed by Altschul
et. al. 1997. These programs are used for sequence similarity identification [7]. They identify regions of local
alignment to assist in detecting relationships among sequences, which allows the user to identify similarities
between the query nucleotide or protein sequence with sequences in public databases, Identifies clusters of
nearby or locally dense "similar" k-tuples (number of string of letters), Used to identify whether a given
ISSN : 0975-5462 Vol. 4 No.01 January 2012 145
3. Prakash Kumar Sarangi et al. / International Journal of Engineering Science and Technology (IJEST)
sequence is novel, homologous to a known sequence, or if the sequence contains motifs which may provide
clues to a possible roles of the sequence being queried [7].
The preferred query sequence format for the BLAST program is the FASTA format which, take input as
DNA sequences. Advanced BLAST tolerates both spaces and numbers and is case insensitive. After some time
it produce format having a Request ID number, E value, max sore, percentage of identified, also species name.
Now each part of patterns are align with the large DNA database of BLAST. It is found that thirty years stock
patterns behavior perform a regular match with species like human being in each individual pattern matching.
Each sub-sequences of stock market cent percent match with Homo sapiens chromosome 19.
3. EXPERIMENTAL RESULTS
The entire data set contains close price of S&P500 of thirty years which, covers the period from first
trading day of January, 1980 to last trading day of December, 2010. The data sets are divided into six periods
having five years each. The first period covers from January, 1980 to January, 1985, the second period is from
January, 1985 to January, 1990, third period is from January, 1990 to January, 1995, fourth period is from
January 1995 to January 2000, fifth period is from January, 2000 to January, 2005 and sixth period is from
January 2005 to December, 2010.
Using BLAST we found that each part of DNA is cent percent match with DNA of Homo sapiens
chromosome 19 having gap zero percent. Homo sapiens chromosome 19 genomic contains 15894584 of
nucleotides. Above six part are pattern match with such nucleotides uniformly cent percent. Also if will divide
entire data set to three parts, the first period is from January, 1980 to January, 1990, the second period is from
January, 1990 to January, 2000 and third period is from January, 2000 to December, 2010. Using same
technique, we found highly similar identification with same species. From experimental results following table-
1 represents entire report of the year 1980 to 2010 comparing between each five years. Similarly table -2
represents comparison result of both DNA patterns in each decade start from 1980 to 2010.
Table-1, maximum identification during each five years
Year duration DNA Max. Max. Identi-
length Score fied
1980-1985 632 39.2 100%
1985-1990 631 39.2 100%
1990-1995 632 39.2 100%
1995-2000 627 39.2 100%
2000-2005 639 39.2 100%
2005-2010 640 39.2 100%
Table-2, maximum identification during each ten years
Year duration DNA Max. Max. Identi-
length Score fied
1980-1990 1263 39.2 100%
1990-2000 1259 39.2 100%
2000-2010 1279 39.2 100%
4. CONCLUSION & FUTURE WORK
This study proposes using bioinformatics tool BLAST that perform a relationship between stock market
behaviors with DNA sequences. In terms of the empirical results, we find that each five years or each decade the
stock market tendency behaves like human DNA sequence. The alignment score says that between them
maximum identification is cent percent in each part. But for prediction purpose it is possible when continuous
ISSN : 0975-5462 Vol. 4 No.01 January 2012 146
4. Prakash Kumar Sarangi et al. / International Journal of Engineering Science and Technology (IJEST)
matching occurs. From experimental work we found that maximum identification not exactly equal to 100% in
each 15 years.
As we found from above experiment each five years stock data are pattern matched with human genome.
Also stock market is fully random, for prediction of stock price possible for next five years. This is the future
work to predict behavior of stock market for coming trading days.
5. REFERENCES
[1] Hasan, A.; Saleem, HM.; Abdullah, S.; “Long- Run Relationships between an Emerging Equity Market and Equity Markets of the
Developed World an Empirical Analysis of Karachi Stock Exchange,” International Research Journal of Finance and Economics,
2008, 16, pp.52-62.
[2] Robert, H.; “Stock Market ‘Patterns’ and Financial Analysis, Journal of Finance , 1959
[3] Kamijo, K.; Tanigawa, T.; Stock Price Pattern Recognition: A Recurrent Neural Network Approach, In Proceedings of the
International Joint Conference on Neural Networks, San Diego, CA (1990) 215-221.
[4] Tsaih, R.; Hsu, Y.; Lai, C.C.; Forecasting S&P 500 Index Futures with a Hybrid AI system. Decision Support Systems 23 (1998)
161-174.
[5] Hassan, M.R.; Nath, B.; Kirley. M.; A fusion model of HMM, ANN and GA for stock market forecasting, Expert Systems with
Applications 33 (2007) 171–180.
[6] Li, F.; Stormo, D. G.; Selection of optimal DNA oligos for gene expression arrays. Bioinformatics, 17: 1067-
[7] 1076, http://bioinformatics.oxfordjournals.org/cgi/content /abstract/17/11/1067.
[8] http://www.ncbi.nlm.nih.gov/BLAST.
[9] Yu, L.; Wang, S.; Lai, K. K.; Mining Stock Market Tendency Using GA-Based Support Vector Machines, WINE 2005, LNCS 3828,
pp. 336 – 345, 2005.
[10] Karp, R.; Rabin, M.; An efficient randomized pattern-matching algorithms, IBM Journal of Research and Development,
31(2):249–260, 1987.
[11] Rajarajeswari, p.; Apparao, A.; Kiran, K. R.; Huffbit Compress – Algorithm to Compress DNA Sequences Using Extended Binary
Trees, Journal of Theoretical and Applied Information Technology, 2005.
ISSN : 0975-5462 Vol. 4 No.01 January 2012 147