Predicting the total turnover of a company in the most unstable stock market and trade conditions has
always proved to be a costly affair causing rise and fall of several trades. Data mining is a well-known
sphere of Computer Science that aims at extracting meaningful information from large databases. However,
despite the existence of many algorithms for the purpose of predicting future trends, their efficiency is
questionable as their predictions suffer from a high error rate. The objective of this paper is to investigate
and rate the performance of classifiers based on the features selected by Hybrid Feature Selection. The
authorized dataset for predicting the turnover was taken fromwww.bsc.com and included the stock market
values of various companies over the past 10 years. The algorithms were investigated using the Weka tool.
The Hybrid feature selection (HFS) algorithm, was run on this dataset to extract the important and
influential features for classification. With these extracted features, the Total Turnover of the company was
predicted using various algorithms like Random Forest, Decision Tree, SVM and Multinomial Regression.
This prediction mechanism was implemented to predict the turnover of a company on an everyday basis
and hence could help navigate through dubious stock markets trades. An accuracy rate of was achieved by
the above prediction process. Moreover, the importance of the stock market attributes through Incremental
Feature Selection (IFS) was established as well.
A model for profit pattern mining based on genetic algorithmeSAT Journals
Abstract
Mining profit oriented patterns is a novel technique of association rule mining in data mining, which basically focuses on important issues related with business. As it is well known that every business aims to generate the profit and find the ways to improve the same. In earlier days association rule mining was used for market basket analysis and targeted only some of the business and commercial aspects. Afterwards the researchers started to aim the most prominent element of any business i.e. Profit, and determined the innovative way to generate the association rules based on profit. Profit oriented patterns mining approach combines the statistic based pattern mining with value-based decision making to generate those patterns with the maximum profit and some ways to generate recommenders for future strategy. To achieve the desired goal the traditional association rule mining alone is not effectual, so we combine the strength of genetic algorithm with association rule mining to enhance its capability. The study shows that Genetic Algorithm improves the effectiveness and efficiency of association rule mining outcome, since genetic algorithms are competent to handle the problems related with the uncertainty, multi-dimensional, non-differential, non-continuous, and non-parametrical, non-linearity constraint and multi-objective optimization problems. In this paper we apply the concept of profit pattern mining with genetic algorithm to generate profit oriented pattern which help out in future business expansion and fulfill the business objective.
Keywords: Data Mining, Association Rule Mining, Profit Pattern Mining, Genetic Algorithm
The Optimization of choosing Investment in the capital markets using artifici...inventionjournals
Optimization is one of crucial items in behavioural sciences. These daystheuse of Meta heuristic has grown considerably in all fields. In this study, we will look for optimization of selection in a portfolio of investment opportunities. We’ve been looking for a selection logic using a meta-heuristic algorithm Called artificial neural networks. The results showed that using artificial neural network algorithm had an optimization in decision-making and selection of investment opportunities. The research is applied one considering the purpose and is looking for developing knowledge in a particular field.
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
its a presentation on stock market analysis using Genetic algorithm with Neural networks ,based on a scientific paper
,made in Cairo university under Supervision of prof.Dr. Magda
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
A model for profit pattern mining based on genetic algorithmeSAT Journals
Abstract
Mining profit oriented patterns is a novel technique of association rule mining in data mining, which basically focuses on important issues related with business. As it is well known that every business aims to generate the profit and find the ways to improve the same. In earlier days association rule mining was used for market basket analysis and targeted only some of the business and commercial aspects. Afterwards the researchers started to aim the most prominent element of any business i.e. Profit, and determined the innovative way to generate the association rules based on profit. Profit oriented patterns mining approach combines the statistic based pattern mining with value-based decision making to generate those patterns with the maximum profit and some ways to generate recommenders for future strategy. To achieve the desired goal the traditional association rule mining alone is not effectual, so we combine the strength of genetic algorithm with association rule mining to enhance its capability. The study shows that Genetic Algorithm improves the effectiveness and efficiency of association rule mining outcome, since genetic algorithms are competent to handle the problems related with the uncertainty, multi-dimensional, non-differential, non-continuous, and non-parametrical, non-linearity constraint and multi-objective optimization problems. In this paper we apply the concept of profit pattern mining with genetic algorithm to generate profit oriented pattern which help out in future business expansion and fulfill the business objective.
Keywords: Data Mining, Association Rule Mining, Profit Pattern Mining, Genetic Algorithm
The Optimization of choosing Investment in the capital markets using artifici...inventionjournals
Optimization is one of crucial items in behavioural sciences. These daystheuse of Meta heuristic has grown considerably in all fields. In this study, we will look for optimization of selection in a portfolio of investment opportunities. We’ve been looking for a selection logic using a meta-heuristic algorithm Called artificial neural networks. The results showed that using artificial neural network algorithm had an optimization in decision-making and selection of investment opportunities. The research is applied one considering the purpose and is looking for developing knowledge in a particular field.
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
its a presentation on stock market analysis using Genetic algorithm with Neural networks ,based on a scientific paper
,made in Cairo university under Supervision of prof.Dr. Magda
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...IAEME Publication
In this paper, MDL based reduction in frequent pattern is presented. The ideal outcome of any pattern mining process is to explore the data in new insights. And also, we need to eliminate the non-interesting patterns that describe noise. The major problem in frequent pattern mining is to identify the interesting patterns. Instead of performing association rule mining on all the frequent item sets, it is feasible to select a sub set of frequent item sets and perform the mining task. Selecting a small set of frequent item sets from large amount of interesting ones is a difficult task. In our approach, MDL based algorithm is used for reducing the number of frequent item sets to be used for association rule mining is presented.
One of the most important problems in modern finance is finding efficient ways to summarize and visualize
the stock market data to give individuals or institutions useful information about the market behavior for
investment decisions Therefore, Investment can be considered as one of the fundamental pillars of national
economy. So, at the present time many investors look to find criterion to compare stocks together and
selecting the best and also investors choose strategies that maximize the earning value of the investment
process. Therefore the enormous amount of valuable data generated by the stock market has attracted
researchers to explore this problem domain using different methodologies. Therefore research in data
mining has gained a high attraction due to the importance of its applications and the increasing generation
information. So, Data mining tools such as association rule, rule induction method and Apriori algorithm
techniques are used to find association between different scripts of stock market, and also much of the
research and development has taken place regarding the reasons for fluctuating Indian stock exchange.
But, now days there are two important factors such as gold prices and US Dollar Prices are more
dominating on Indian Stock Market and to find out the correlation between gold prices, dollar prices and
BSE index statistical correlation is used and this helps the activities of stock operators, brokers, investors
and jobbers. They are based on the forecasting the fluctuation of index share prices, gold prices, dollar
prices and transactions of customers. Hence researcher has considered these problems as a topic for
research.
Fuzzy Analytic Hierarchy Based DBMS Selection In Turkish National Identity Ca...Ferhat Ozgur Catak
Database Management Systems (DBMS) play an important role to support
enterprise application developments. Selection of the right DBMS is a crucial decision for
software engineering process. This selection requires optimizing a number of criteria.
Evaluation and selection of DBMS among several candidates tend to be very complex. It
requires both quantitative and qualitative issues. Wrong selection of DBMS will have a
negative effect on the development of enterprise application. It can turn out to be costly and adversely affect business process. The following study focuses on the evaluation of a multi criteria
decision problem by the usage of fuzzy logic. We will demonstrate the methodological considerations
regarding to group decision and fuzziness based on the DBMS selection problem. We developed a new
Fuzzy AHP based decision model which is formulated and proposed to select a DBMS easily. In this
decision model, first, main criteria and their sub criteria are determined for the evaluation. Then these
criteria are weighted by pair-wise comparison, and then DBMS alternatives are evaluated by assigning a
rating scale.
STOCK MARKET PREDICTION USING MACHINE LEARNING METHODSIAEME Publication
Stock price forecasting is a popular and important topic in financial and academic
studies. Share market is an volatile place for predicting since there are no significant
rules to estimate or predict the price of a share in the share market. Many methods
like technical analysis, fundamental analysis, time series analysis and statistical
analysis etc. are used to predict the price in tie share market but none of these
methods are proved as a consistently acceptable prediction tool. In this paper, we
implemented a Random Forest approach to predict stock market prices. Random
Forests are very effectively implemented in forecasting stock prices, returns, and stock
modeling. We outline the design of the Random Forest with its salient features and
customizable parameters. We focus on a certain group of parameters with a relatively
significant impact on the share price of a company. With the help of sentiment
analysis, we found the polarity score of the new article and that helped in forecasting
accurate result. Although share market can never be predicted with hundred per-cent
accuracy due to its vague domain, this paper aims at proving the efficiency of Random
forest for forecasting the stock prices
Artificial Intelligence and Stock Marketingijsrd.com
Business sagacity is turning into a significant pattern in money related world. One such range is securities exchange knowledge that makes utilization of information mining strategies, for example, affiliation, grouping, fake neural systems, choice tree, hereditary calculation, master frameworks and fuzzy rationale. These strategies could be utilized to anticipate stock value or exchanging indicator naturally with adequate exactness. In spite of the fact that there has been a loads of exploration done here , still there are numerous issues that have not been investigated yet furthermore it is not clear to new analysts where and how to begin . Information mining could be connected on over a significant time span monetary information to create examples and choice making framework. This paper gives concise review of a few endeavors made via scientists for stock expectation by concentrating on securities exchange dissection furthermore characterizes another exploration space to comprehend the sagacity of stock exchange. This alludes as stock exchange brainpower, which is to create information mining strategies to help all parts of algorithmic exchanging furthermore recommend various exploration issues in stock knowledge identified with guaging& its exactness.
TOURISM DEMAND FORECASTING MODEL USING NEURAL NETWORKijcsit
Travel agencies should be able to judge the market demand for tourism to develop sales plans accordingly.However, many travel agencies lack the ability to judge the market demand for tourism, and thus make risky business decisions. Based on the above, this study applied the Artificial Neural Network combined with the Genetic Algorithm (GA) to establish a prediction model of air ticket sales revenue. GA was used to
determine the optimum number of input and hidden nodes of a feedforward neural network. The empirical results suggested that the mean absolute relative error(MARE) of the proposed hybrid model’s predicted value of air ticket sales revenue and the actual value was 10.51%and the correlation coefficient was 0.913. The proposed model had good predictive capability and could provide travel agency operators with reliable and highly efficient analysis data.
A Survey on the Clustering Algorithms in Sales Data MiningEditor IJCATR
This paper discusses different clustering techniques that can be used in sales databases. The advancement of digital data
collection and build up of data in data banks as a result of modernization in sales disciplines has brought in great challenges of data
processing for better and meaningful results due to mass data deposits. Clustering techniques therefore are quite necessary so that the
senior management in sales department can have access to processed data as they engage themselves in decision making processes.
In this paper, I focus on the retail sales data mining, classification and clustering techniques. In this study I analyze the attributes for
the prediction of buyer’s behavior and purchase performance by use of various classification methods like decision trees, C4.5
algorithm and ID3 algorithm.
Introduction to feature subset selection methodIJSRD
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...ertekg
İndirmek için Bağlantı > https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-association-mining-results-through-visualization-data-envelopment-analysis-and-decision-trees/
Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding how re-mining can be carried out on association mining results, are answered in the case study through empirical analysis.
The stock exchange is an important apparatus for economic growth as it is an opportunity for investors to acquire equity and, at the same time, provide resources for organizations expansions. On the other hand, a major concern regarding entering this market is related with the dynamic in which deals are made since the pricing of shares happens in a smart and oscillatory way. Due to this context, several researchers are studying techniques in order to predict the stock exchange, maximize profits and reduce risks. Thus, this study proposes a linear regression model for stock exchange prediction which, combined with financial indicators, provides support decision-making by investors.
Augmentation of Customer’s Profile Dataset Using Genetic AlgorithmRSIS International
Data is the lifeblood of all type of business. Clean,
accurate and complete data is the prerequisite for the decisionmaking
in business process. Data is one of the most valuable
assets for any organization. It is immensely important that the
business focus on the quality of their data as it can help in
increasing the business performance by improving efficiencies,
streamlining operations and consolidating data sources. Good
quality data helps to improve and simplify processes, eliminate
time-consuming rework and externally to enhance a user’s
experience, further translating it to significant financial and
operational benefits [1] [2]. All organizations/ businesses strive to
retain their existing customers and gain new ones. Accurate data
enables the business to improve the customer experience. Data
augmentation adds value to base data by enhancing information
derived from the existing source. Data augmentation can help
reduce the manual intervention required to develop meaningful
information and insight of business data, as well as significantly
enhance data quality. Hence the business can provide unique
customer experience and deliver above and beyond their
expectations. The Data Augmentation is immensely important as
it helps in improving the overall productivity of the business. It
is also important in making the most accurate and relevant
information available quickly for decision making.
This work focuses on augmentation of the customer
dataset using Genetic Algorithm(GA). These augmented data are
used for the purpose of customer behavioral analysis. The data
set consists of the different factors inherent in each situation of
the customer to understand the market strategy. This behavioral
data is used in the earlier work of analyzing the data [13]. It is
found that collecting a very large amount of such data manually
is a very cumbersome process. It is inferred from the earlier
work [13] that the more number of data may give accurate
result. Hence it is decided to enrich the dataset by using Genetic
Algorithm.
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Feature selection is one of the most fundamental steps in machine learning. It is closely related to
dimensionality reduction. A commonly used approach in feature selection is ranking the individual
features according to some criteria and then search for an optimal feature subset based on an evaluation
criterion to test the optimality. The objective of this work is to predict more accurately the presence of
Learning Disability (LD) in school-aged children with reduced number of symptoms. For this purpose, a
novel hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The process of feature ranking
follows a method of calculating the significance or priority of each symptoms of LD as per their
contribution in representing the knowledge contained in the dataset. Each symptoms significance or
priority values reflect its relative importance to predict LD among the various cases. Then by eliminating
least significant features one by one and evaluating the feature subset at each stage of the process, an
optimal feature subset is generated. For comparative analysis and to establish the importance of rough set
theory in feature selection, the backward feature elimination algorithm is combined with two state-of-theart
filter based feature ranking techniques viz. information gain and gain ratio. The experimental results
show the proposed feature selection approach outperforms the other two in terms of the data reduction.
Also, the proposed method eliminates all the redundant attributes efficiently from the LD dataset without
sacrificing the classification performance.
Data leakage means sending confidential data to an unauthorized person. Nowadays, identifying confidential data is a big challenge for the organizations. We developed a system by using data mining techniques, which identifies confidential data of an organization. First, we create clusters for the training data set. Next, identify confidential terms and context terms for each cluster. Finally, based on the
confidential terms and context terms, the confidentiality level of the detected document calculated in terms of score. If the score of the detected document beyond a predefined threshold, then the document is blocked and marked as a confidential.
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...IAEME Publication
In this paper, MDL based reduction in frequent pattern is presented. The ideal outcome of any pattern mining process is to explore the data in new insights. And also, we need to eliminate the non-interesting patterns that describe noise. The major problem in frequent pattern mining is to identify the interesting patterns. Instead of performing association rule mining on all the frequent item sets, it is feasible to select a sub set of frequent item sets and perform the mining task. Selecting a small set of frequent item sets from large amount of interesting ones is a difficult task. In our approach, MDL based algorithm is used for reducing the number of frequent item sets to be used for association rule mining is presented.
One of the most important problems in modern finance is finding efficient ways to summarize and visualize
the stock market data to give individuals or institutions useful information about the market behavior for
investment decisions Therefore, Investment can be considered as one of the fundamental pillars of national
economy. So, at the present time many investors look to find criterion to compare stocks together and
selecting the best and also investors choose strategies that maximize the earning value of the investment
process. Therefore the enormous amount of valuable data generated by the stock market has attracted
researchers to explore this problem domain using different methodologies. Therefore research in data
mining has gained a high attraction due to the importance of its applications and the increasing generation
information. So, Data mining tools such as association rule, rule induction method and Apriori algorithm
techniques are used to find association between different scripts of stock market, and also much of the
research and development has taken place regarding the reasons for fluctuating Indian stock exchange.
But, now days there are two important factors such as gold prices and US Dollar Prices are more
dominating on Indian Stock Market and to find out the correlation between gold prices, dollar prices and
BSE index statistical correlation is used and this helps the activities of stock operators, brokers, investors
and jobbers. They are based on the forecasting the fluctuation of index share prices, gold prices, dollar
prices and transactions of customers. Hence researcher has considered these problems as a topic for
research.
Fuzzy Analytic Hierarchy Based DBMS Selection In Turkish National Identity Ca...Ferhat Ozgur Catak
Database Management Systems (DBMS) play an important role to support
enterprise application developments. Selection of the right DBMS is a crucial decision for
software engineering process. This selection requires optimizing a number of criteria.
Evaluation and selection of DBMS among several candidates tend to be very complex. It
requires both quantitative and qualitative issues. Wrong selection of DBMS will have a
negative effect on the development of enterprise application. It can turn out to be costly and adversely affect business process. The following study focuses on the evaluation of a multi criteria
decision problem by the usage of fuzzy logic. We will demonstrate the methodological considerations
regarding to group decision and fuzziness based on the DBMS selection problem. We developed a new
Fuzzy AHP based decision model which is formulated and proposed to select a DBMS easily. In this
decision model, first, main criteria and their sub criteria are determined for the evaluation. Then these
criteria are weighted by pair-wise comparison, and then DBMS alternatives are evaluated by assigning a
rating scale.
STOCK MARKET PREDICTION USING MACHINE LEARNING METHODSIAEME Publication
Stock price forecasting is a popular and important topic in financial and academic
studies. Share market is an volatile place for predicting since there are no significant
rules to estimate or predict the price of a share in the share market. Many methods
like technical analysis, fundamental analysis, time series analysis and statistical
analysis etc. are used to predict the price in tie share market but none of these
methods are proved as a consistently acceptable prediction tool. In this paper, we
implemented a Random Forest approach to predict stock market prices. Random
Forests are very effectively implemented in forecasting stock prices, returns, and stock
modeling. We outline the design of the Random Forest with its salient features and
customizable parameters. We focus on a certain group of parameters with a relatively
significant impact on the share price of a company. With the help of sentiment
analysis, we found the polarity score of the new article and that helped in forecasting
accurate result. Although share market can never be predicted with hundred per-cent
accuracy due to its vague domain, this paper aims at proving the efficiency of Random
forest for forecasting the stock prices
Artificial Intelligence and Stock Marketingijsrd.com
Business sagacity is turning into a significant pattern in money related world. One such range is securities exchange knowledge that makes utilization of information mining strategies, for example, affiliation, grouping, fake neural systems, choice tree, hereditary calculation, master frameworks and fuzzy rationale. These strategies could be utilized to anticipate stock value or exchanging indicator naturally with adequate exactness. In spite of the fact that there has been a loads of exploration done here , still there are numerous issues that have not been investigated yet furthermore it is not clear to new analysts where and how to begin . Information mining could be connected on over a significant time span monetary information to create examples and choice making framework. This paper gives concise review of a few endeavors made via scientists for stock expectation by concentrating on securities exchange dissection furthermore characterizes another exploration space to comprehend the sagacity of stock exchange. This alludes as stock exchange brainpower, which is to create information mining strategies to help all parts of algorithmic exchanging furthermore recommend various exploration issues in stock knowledge identified with guaging& its exactness.
TOURISM DEMAND FORECASTING MODEL USING NEURAL NETWORKijcsit
Travel agencies should be able to judge the market demand for tourism to develop sales plans accordingly.However, many travel agencies lack the ability to judge the market demand for tourism, and thus make risky business decisions. Based on the above, this study applied the Artificial Neural Network combined with the Genetic Algorithm (GA) to establish a prediction model of air ticket sales revenue. GA was used to
determine the optimum number of input and hidden nodes of a feedforward neural network. The empirical results suggested that the mean absolute relative error(MARE) of the proposed hybrid model’s predicted value of air ticket sales revenue and the actual value was 10.51%and the correlation coefficient was 0.913. The proposed model had good predictive capability and could provide travel agency operators with reliable and highly efficient analysis data.
A Survey on the Clustering Algorithms in Sales Data MiningEditor IJCATR
This paper discusses different clustering techniques that can be used in sales databases. The advancement of digital data
collection and build up of data in data banks as a result of modernization in sales disciplines has brought in great challenges of data
processing for better and meaningful results due to mass data deposits. Clustering techniques therefore are quite necessary so that the
senior management in sales department can have access to processed data as they engage themselves in decision making processes.
In this paper, I focus on the retail sales data mining, classification and clustering techniques. In this study I analyze the attributes for
the prediction of buyer’s behavior and purchase performance by use of various classification methods like decision trees, C4.5
algorithm and ID3 algorithm.
Introduction to feature subset selection methodIJSRD
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...ertekg
İndirmek için Bağlantı > https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-association-mining-results-through-visualization-data-envelopment-analysis-and-decision-trees/
Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding how re-mining can be carried out on association mining results, are answered in the case study through empirical analysis.
The stock exchange is an important apparatus for economic growth as it is an opportunity for investors to acquire equity and, at the same time, provide resources for organizations expansions. On the other hand, a major concern regarding entering this market is related with the dynamic in which deals are made since the pricing of shares happens in a smart and oscillatory way. Due to this context, several researchers are studying techniques in order to predict the stock exchange, maximize profits and reduce risks. Thus, this study proposes a linear regression model for stock exchange prediction which, combined with financial indicators, provides support decision-making by investors.
Augmentation of Customer’s Profile Dataset Using Genetic AlgorithmRSIS International
Data is the lifeblood of all type of business. Clean,
accurate and complete data is the prerequisite for the decisionmaking
in business process. Data is one of the most valuable
assets for any organization. It is immensely important that the
business focus on the quality of their data as it can help in
increasing the business performance by improving efficiencies,
streamlining operations and consolidating data sources. Good
quality data helps to improve and simplify processes, eliminate
time-consuming rework and externally to enhance a user’s
experience, further translating it to significant financial and
operational benefits [1] [2]. All organizations/ businesses strive to
retain their existing customers and gain new ones. Accurate data
enables the business to improve the customer experience. Data
augmentation adds value to base data by enhancing information
derived from the existing source. Data augmentation can help
reduce the manual intervention required to develop meaningful
information and insight of business data, as well as significantly
enhance data quality. Hence the business can provide unique
customer experience and deliver above and beyond their
expectations. The Data Augmentation is immensely important as
it helps in improving the overall productivity of the business. It
is also important in making the most accurate and relevant
information available quickly for decision making.
This work focuses on augmentation of the customer
dataset using Genetic Algorithm(GA). These augmented data are
used for the purpose of customer behavioral analysis. The data
set consists of the different factors inherent in each situation of
the customer to understand the market strategy. This behavioral
data is used in the earlier work of analyzing the data [13]. It is
found that collecting a very large amount of such data manually
is a very cumbersome process. It is inferred from the earlier
work [13] that the more number of data may give accurate
result. Hence it is decided to enrich the dataset by using Genetic
Algorithm.
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Feature selection is one of the most fundamental steps in machine learning. It is closely related to
dimensionality reduction. A commonly used approach in feature selection is ranking the individual
features according to some criteria and then search for an optimal feature subset based on an evaluation
criterion to test the optimality. The objective of this work is to predict more accurately the presence of
Learning Disability (LD) in school-aged children with reduced number of symptoms. For this purpose, a
novel hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The process of feature ranking
follows a method of calculating the significance or priority of each symptoms of LD as per their
contribution in representing the knowledge contained in the dataset. Each symptoms significance or
priority values reflect its relative importance to predict LD among the various cases. Then by eliminating
least significant features one by one and evaluating the feature subset at each stage of the process, an
optimal feature subset is generated. For comparative analysis and to establish the importance of rough set
theory in feature selection, the backward feature elimination algorithm is combined with two state-of-theart
filter based feature ranking techniques viz. information gain and gain ratio. The experimental results
show the proposed feature selection approach outperforms the other two in terms of the data reduction.
Also, the proposed method eliminates all the redundant attributes efficiently from the LD dataset without
sacrificing the classification performance.
Data leakage means sending confidential data to an unauthorized person. Nowadays, identifying confidential data is a big challenge for the organizations. We developed a system by using data mining techniques, which identifies confidential data of an organization. First, we create clusters for the training data set. Next, identify confidential terms and context terms for each cluster. Finally, based on the
confidential terms and context terms, the confidentiality level of the detected document calculated in terms of score. If the score of the detected document beyond a predefined threshold, then the document is blocked and marked as a confidential.
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...IJDKP
This paper aims to explain the web-enabled tools for educational data mining. The proposed web-based
tool developed using Asp.Net framework and php can be helpful for universities or institutions providing
the students with elective courses as well improving academic activities based on feedback collected from
students. In Asp.Net tool, association rule mining using Apriori algorithm is used whereas in php based
Feedback Analytical Tool, feedback related to faculty and institutional infrastructure is collected from
students and based on that Feedback it shows performance of faculty and institution. Using that data, it
helps management to improve in-house training skills and gains knowledge about educational trends which
is to be followed by faculty to improve the effectiveness of the course and teaching skills.
400crores is the amount paid by pepsico to IPL and coca-cola still managed to market in the IPL with few crores of rupees.. How did coca-cola ambush..
Read on to know more..
Comparison of Feature selection methods for diagnosis of cervical cancer usin...IJERA Editor
Even though a great attention has been given on the cervical cancer diagnosis, it is a tuff task to observe the
pap smear slide through microscope. Image Processing and Machine learning techniques helps the pathologist
to take proper decision. In this paper, we presented the diagnosis method using cervical cell image which is
obtained by Pap smear test. Image segmentation performed by multi-thresholding method and texture and shape
features are extracted related to cervical cancer. Feature selection is achieved using Mutual Information(MI),
Sequential Forward Search (SFS), Sequential Floating Forward Search (SFFS) and Random Subset Feature
Selection(RSFS) methods
Global and china automotive seating industry report, 2014 2015ResearchInChina
The seemingly simple automotive seating actually reflects a country's machining ability. In the world, at most 20 companies are capable of producing automotive seating whose frames are made of precision metals via stamping, while nearly a thousand companies have the capability to produce automotive engines.
Turnover Prediction of Shares Using Data Mining Techniques : A Case Study csandit
Predicting the Total turnover of a company in the ever fluctuating Stock market has always proved to be a precarious situation and most certainly a difficult task at hand. Data mining is a
well-known sphere of Computer Science that aims at extracting meaningful information from large databases. However, despite the existence of many algorithms for the purpose of
predicting future trends, their efficiency is questionable as their predictions suffer from a high
error rate. The objective of this paper is to investigate various existing classification algorithms
to predict the turnover of different companies based on the Stock price. The authorized datasetfor predicting the turnover was taken from www.bsc.com and included the stock market valuesof various companies over the past 10 years. The algorithms were investigated using the ‘R’
tool. The feature selection algorithm, Boruta, was run on this dataset to extract the important
and influential features for classification. With these extracted features, the Total Turnover of
the company was predicted using various algorithms like Random Forest, Decision Tree, SVM and Multinomial Regression. This prediction mechanism was implemented to predict the turnover of a company on an everyday basis and hence could help navigate through dubious
stock markets trades. An accuracy rate of 95% was achieved by the above prediction process.
Moreover, the importance of the stock market attributes was established as well.
Artificial Intelligence Based Stock Market Prediction Model using Technical I...ijtsrd
The stock market is highly volatile and complex in nature. However, notion of stock price predictability is typical, many researchers suggest that the Buy and Sell prices are predictable and investor can make above average profits using efficient Technical Analysis TA .Most of the earlier prediction models predict individual stocks and the results are mostly influenced by company’s reputation, news, sentiments and other fundamental issues while stock indices are less affected by these issues. In this work, an effort is made to predict the Buy and Sell decisions of stocks, trends of stock by utilizing Stock Technical Indicators STIs As a part of prediction model the Long Short Term Memory LSTM , Support Virtual Machine SVM Artificial intelligence algorithms will be used with Stock Technical Indicators STIs. The project will be carried on National Stock Exchange NSE Stocks of India. Mr. Ketan Ashok Bagade | Yogini Bagade "Artificial Intelligence Based Stock Market Prediction Model using Technical Indicators" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-2 , April 2023, URL: https://www.ijtsrd.com.com/papers/ijtsrd53854.pdf Paper URL: https://www.ijtsrd.com.com/management/other/53854/artificial-intelligence-based-stock-market-prediction-model-using-technical-indicators/mr-ketan-ashok-bagade
A Comparison of Stock Trend Prediction Using Accuracy Driven Neural Network V...idescitation
In the recent scenario, nevertheless to say, modern
finance is facing many hurdles to find effective ways to gather
information about stock market data at one shot. At the same
time it is inevitable for both individuals & institutions to
visualize, summarize & enhance their knowledge about the
market behavior for making wise decisions. This paper surveys
recent literature in the domain of neural network variants to
forecast the stock market trends. Classification is made in
terms of dependant variables, data preprocessing techniques
used, network structure, performance analysis and other
useful modeling information. Through the surveyed papers it
is shown that the neural network variants are widely accepted
to study and evaluate stock market behavior compared to
standalone neural network.
Project report on Share Market applicationKRISHNA PANDEY
This is the proposal document for AVS Group of Technology service offering in the website design and development and custom web application development space. The document details our understanding of the brief, the objectives of the services suite, the methodology, and deliverable and commercials.
Enhanced Decision Support System for Portfolio Management Using Financial Ind...ijbiss
In many cases, financial indicators are used for market analysis and to forecast the future of stock prices. Due to the high complexity of the stock market, determining which indicators should be used and the reliability of their outcomes have always been a challenge. In this article, a hybrid approach in the form of a decision support system is being introduced that offers the best suggestions in buying and selling stocks. This system will help an investor to identify the best portfolio of stocks using a series of financial indicators. These indices act as a model that forecast the future price of a stock by examining its activities and status in the past. Therefore, using a combination of the indices enables us to make decisions with more certainty. Proficiency of this system has been evaluated through the collection of data from the stock market in Iran from 2001 through 2011. The results show that the use of indices and their combination have led to the decision support system to produce suggestions with very high precisions.
OPENING RANGE BREAKOUT STOCK TRADING ALGORITHMIC MODELIJCI JOURNAL
Stock Trading Algorithmic Model is an important research problem that is dealt with knowledge in
fundamental and technical analysis, combined with the knowledge expertise in programming and computer
science. There have been numerous attempts in predicting stock trends, we aim to predict it with least
amount of computation and to decrease the space complexity. The goal of this paper is to create a hybrid
recommendation system that will inform the trader about the future of a stock trend in order to improve the
profitability of a short term investment. We make use of technical analysis tools to incorporate this
recommendation into our system. In order to understand the results, we implemented a prototype in R
programming language.
Impact and Implications of Operations Research in Stock Marketinventionjournals
The motivation of this article is to advocate the administrative routine of settling on choices construct in light of instinct, as well as instinct combined with quantitative investigation. Operations Research (OR) is one of the main administrative choice science instruments utilized by benefit and charitable, for example, stock market. Gauging stock return is an important financial subject that has attracted researchers' consideration for a long time. It includes a supposition that basic data openly accessible in the past has some prescient connections to the future stock returns. This review tries to help the financial specialists in the stock market to choose the better planning for purchasing or offering stocks based on the information extricated from the chronicled costs of such stocks. The choice taken will be founded on choice tree classifier which is one of the Operations Research techniques.
Impact and Implications of Operations Research in Stock Marketinventionjournals
The motivation of this article is to advocate the administrative routine of settling on choices construct in light of instinct, as well as instinct combined with quantitative investigation. Operations Research (OR) is one of the main administrative choice science instruments utilized by benefit and charitable, for example, stock market. Gauging stock return is an important financial subject that has attracted researchers' consideration for a long time. It includes a supposition that basic data openly accessible in the past has some prescient connections to the future stock returns. This review tries to help the financial specialists in the stock market to choose the better planning for purchasing or offering stocks based on the information extricated from the chronicled costs of such stocks. The choice taken will be founded on choice tree classifier which is one of the Operations Research techniques.
ENHANCED DECISION SUPPORT SYSTEM FOR PORTFOLIO MANAGEMENT USING FINANCIAL IND...ijbiss
In many cases, financial indicators are used for market analysis and to forecast the future of stock prices.
Due to the high complexity of the stock market, determining which indicators should be used and the
reliability of their outcomes have always been a challenge. In this article, a hybrid approach in the form of
a decision support system is being introduced that offers the best suggestions in buying and selling stocks.
This system will help an investor to identify the best portfolio of stocks using a series of financial
indicators. These indices act as a model that forecast the future price of a stock by examining its activities
and status in the past. Therefore, using a combination of the indices enables us to make decisions with
more certainty. Proficiency of this system has been evaluated through the collection of data from the stock
market in Iran from 2001 through 2011. The results show that the use of indices and their combination
have led to the decision support system to produce suggestions with very high precisions.
Similar to IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION (20)
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
1. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
DOI : 10.5121/ijdkp.2015.5604 45
IMPROVED TURNOVER PREDICTION OF
SHARES USING HYBRID FEATURE SELECTION
Shashaank D.S, Sruthi.V, Vijayalashimi M.L.S and Shomona Garcia Jacob
Department of Computer Science and Engineering,
SSNCE, Chennai, India
ABSTRACT
Predicting the total turnover of a company in the most unstable stock market and trade conditions has
always proved to be a costly affair causing rise and fall of several trades. Data mining is a well-known
sphere of Computer Science that aims at extracting meaningful information from large databases. However,
despite the existence of many algorithms for the purpose of predicting future trends, their efficiency is
questionable as their predictions suffer from a high error rate. The objective of this paper is to investigate
and rate the performance of classifiers based on the features selected by Hybrid Feature Selection. The
authorized dataset for predicting the turnover was taken fromwww.bsc.com and included the stock market
values of various companies over the past 10 years. The algorithms were investigated using the Weka tool.
The Hybrid feature selection (HFS) algorithm, was run on this dataset to extract the important and
influential features for classification. With these extracted features, the Total Turnover of the company was
predicted using various algorithms like Random Forest, Decision Tree, SVM and Multinomial Regression.
This prediction mechanism was implemented to predict the turnover of a company on an everyday basis
and hence could help navigate through dubious stock markets trades. An accuracy rate of was achieved by
the above prediction process. Moreover, the importance of the stock market attributes through Incremental
Feature Selection (IFS) was established as well.
.
KEYWORDS
Data mining, Hybrid Feature selection, classification algorithms, Turnover prediction
1. INTRODUCTION
Prediction of stock market prices, its rise and fall of values has constantly proved to be a perilous
task mainly due to the volatile nature of the market[1-3]. However data mining techniques and
other computational intelligence techniques have been applied to achieve the same over the years.
Some of the approaches undertaken included the use of decision tree algorithm, concepts of
neural networks and Midas[4-6]. However through this paper, a comparative study was
conducted to estimate and predict the turnover of companies that include Infosys, Sintex, HDFC
and Apollo hospitals using the features selected by the Hybrid Feature Selection algorithm
(Ramani et al, 2013) by utilizing classifiers such as Naive Bayes, Bayesian Networks, Random
Forest, Nearest Neighor and J48. In order to estimate the performance of the aforementioned
machine learning algorithms in predicting the turnover, a confusion matrix was also constructed
with respect to the dataset. Based on the predictions made by each of the algorithms with respect
2. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
46
to the total turnover for a company (on an everyday basis), an accuracy rate was estimated for
each of them from the number of true positives/negatives and false positives/negatives. A brief
review of the state-of-the-art in predicting stock market share data is given below.
2. RELATED WORK
The objective of any nation at large is to enhance the lifestyle of common man and that is the
driving force to undertake research to predict the market trends [7-9]. In the recent decade, much
research has been done on neural networks to predict the stock market changes [10].
Matsui and Sato [12] proposed a new evaluation method to dissolve the over fitting problem in
the Genetic Algorithm (GA) training. On comparing the conventional and the neighbourhood
evaluation they found the new evaluation method to be better than the conventional one in terms
of performance. Gupta, Aditya, and Dhingra [13] proposed a stock market prediction technique
based on Hidden Markov Models. In that approach, the authors considered the fractional change
in stock value and the intra-day high and low values of the stock to train the continuous Hidden
Markov Model (HMM).Then this HMM is used to make a Maximum a Posteriori decision over
all the possible stock values for the next day. The authors applied this approach on several stocks,
and compared the performance to the existing methods. Lin, Guo, and Hu[14] proposed a SVM
based stock market prediction system .This system selected a good feature subset, evaluated stock
indicator and controlled over fitting on stock market tendency prediction. The authors tested this
approach on Taiwan stock market datasets and found that the proposed system surpassed the
conventional stock market prediction system in terms of performance. Ramani et al(2013)
proposed the Hybrid Feature Selection methodology on Clinical Lung Cancer datasets wherein
the authors concluded that combining the gain ratio with correlation of features to each other and
to the target class resulted in an optimal feature subset. This was also followed by Incremental
Feature Selection as discussed in the ensuing section.
3. PROPOSED HFS BASED STOCK TURNOVER PREDICTION
FRAMEWORK
The stock turnover prediction framework proposed in this paper is portrayed in Figure 1. The
basic methodology involved Data Collection, Pre-processing, Hybrid Feature Selection and
Classification, each of which is explained below.
The dataset utilised for predicting the turnover was taken from www.bsc.com which included the
stock market values of companies including Infosys, HDFC, Apollo Hospitals and Sintex, over
the past 10 years.
3. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
47
Figure 1: Stock Turnover Prediction framework through HFS
3.1 Data Processing
Initially, all the records with missing values were removed from the dataset in order to improve
the accuracy of the prediction. Then the data was further partitioned into two parts:
Training data (d.t): It is the data with which the machine is trained. Various classification
algorithms are trained on this data. 60% of the data is taken as training data.
Validation data (d.v):It is the data which is used for the purpose of cross-validation. It is used
to find the accuracy rate of each algorithm. The remaining 40% of the data is taken as validation
data.
In order to apply the classification algorithms, the data was first sorted according to the turnover.
Then the total turnover was discretised into:
A - 58,320 to 18,291,986
B – 18,296,597 to 37,731,606
C – 37,749,751 to 121,233,543
D – 121,245,870 to 300,360,881
E- 300,465,316 to 19,085,311,470
Also, the company features were converted into dummy variables (0’s/1’s) to help the prediction
process easier.
The stock market data was characterized by attributes described in Table 1. The stock market
starts at 9:15 in the morning and ends at 3:30 in the afternoon. The attributes described in Table 1
are recorded within this time frame.
4. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
48
Table 1. Stock Market Share Data – Attribute Description
S.NO ATTRIBUTE DESCRIPTION
1. Open price The first traded price during the day or in the
morning.
2. High price The highest traded price during the day.
3. Low price The lowest price traded during the day.
4. Close price The last price traded during the day.
5. WAP Weighted average price during the day.
6. No of shares The total number of shares done during the
day.
7. No of trades No of trades is the total no of transactions
during the day.
8. Deliverable
quantity.
The quantity that can be delivered at the end of
the day.
9. Spread high low Range of High price and low prices.
10. Spread close open Range of close and low prices.
11. Company The name of the company that handles the
shares.
12. Total turn over Turnover is the total no of shares traded X
Price of each share sold.
13. Date The date for which the above attributes are
recorded.
Once the data was pre-processed, the important features to make an accurate prediction were
identified by the process of feature selection.
3.2 Feature Selection
Feature ranking presented significant features in the order of their contribution to categorizing the
samples under the different target classes. Since most feature selection algorithms focused on
ranking the attributes according to their significance value, the liability of choosing the limiting
constraint rested with the user. Hence in order to automate the process of finding the minimal yet
optimal set of features, the ranking feature selection algorithms were followed by Correlation
Subset Evaluators that included features highly correlated to the class and least correlated to each
other. Since both the ranking and subset evaluators were utilized to obtain the optimal feature set,
5. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
49
this was termed the Hybrid Feature Selection strategy. The description of the methods used in
this research is detailed below.
Gain ratio criterion [19]revealed the association between an attribute and the class value, being
primarily computed from the Information Gain using the Information Entropy (InfoE) values.
The CFS hypothesis suggested that the most predictive features needed to be highly correlated to
the target class and least relevant to other predictor attributes. The predictor attributes generated
by the Gain Ratio and CFS Subset Attribute Evaluator (Hybrid Feature Selection) methods were
later utilized for Incremental Feature Selection (IFS) to determine the minimal and optimal set of
features. On adding each feature, a new feature set was obtained.
3.3 Classification
Classification [16-19] is the process of finding a set of models that describe and distinguish data
classes. This is done to achieve the goal of being able to use the model to predict the class whose
label is unknown. The classification phase involved the execution of the classification algorithms
to identify the best performing algorithm. The classification accuracy obtained by percentage
split as discussed in the data pre-processing phase, was calculated and a comparison was drawn
among the classifiers. The algorithms that yielded the highest accuracy is described below.
Random Forest
In random forest [18-19] a randomly selected set of attributes is used to split each node. Every
node is split using the best split among a subset of predictors that are deliberately chosen
randomly at the node. This is in contrast to the methodology followed in standard tress in which
each node is split using the best split among all attributes available in the dataset. Further new
values are predicted by aggregating and collating the predictions of the various decision trees
constructed.
Random forest represents an ensemble model / algorithm as it derives its final prediction from
multiple individual models. These individual models could be of similar or different type.
However, in the case of Random Forest, the individual models are of the same type – decision
trees.
The Random Forest algorithm yielded 97.2% accuracy with the three features, No.of shares,
No.of trades and Close_price.
The performances of the feature selection and classification algorithms are discussed below.
4. RESULT ANALYSIS
The results analysis is discussed in two sections. The former section elaborates on the feature
selection process while the latter section makes a detailed analysis on the performance of the
classification algorithms.
4.1 Performance Analysis of Feature Selection
The Hybrid Feature Selection algorithm resulted in an optimal feature subset that contained only
3 features viz, No. of shares, No. of trades and closing price. The Gain Ratio of these features is
depicted in Table 2.
6. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
50
Table 2. Attribute Importance in Turnover Prediction – Gain Ratio Ranking
Sl.No Feature Selected Gain Ratio
1. No.of Shares 0.252
2. No. of trades 0.2349
3. Closing _ Price 0.2212
Once the important features were identified, the next phase involved predicting the turnover from
the features in order to estimate the probable combination of attributes that yield a high turnover.
4.2 Performance Analysis of Classification Algorithms
Each of the classification algorithms were first trained using the training data which contained
60% of the dataset. The 10-fold cross-validation method was employed to evaluate the
performance of the classification algorithms and the obtained accuracy is depicted in Table 3.
Accuracy rate = No. of correctly classified observations x 100
Total No. of observations
Table 3. Comparative Performance of Classification Algorithms
S.No Classification Algorithms Accuracy (%)
1 Random Forest 97.2
2 J48 95.7
3 Nearest-Neighbour 81.8
4 Bayesian Network 73.7
5 Naive Bayes 57.5
The graphical representation of the total turnover prediction of the companies is given in Figure
2.
Figure 2 – Comparative Performance of Classification Algorithms
It is evident from the result analysis that a small subset of features is sufficient to accurately
predict the total turnover of a company. Moreover the Random Forest algorithm has proved to
7. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
51
accurately predict the turnover for the real-time share data of different companies which gives the
lead to investigate many other boosting and ensemble techniques to enhance the prediction
accuracy.
5. CONCLUSION
Application of data mining techniques to predict turnover based on stock market share data is an
emerging area of research and is expected to be instrumental in moulding the country’s economy
by predicting possible investment trends to increase turnover. In view of this, an efficient way of
implementing the Random Forest algorithm with Hybrid Feature Selection is proposed in order to
mitigate the risks involved in predicting the turnover of a company. The optimal feature subset
was also identified to predict the turnover with maximum accuracy. An accuracy rate of 97% was
achieved in the prediction process. This accuracy rate was much higher than those obtained
before. Hence we believe that further research using computational methodologies to predict
turnover on a daily basis based on share market data will reveal better and more interesting
patters for investments.
REFERENCES
[1] Abhishek Gupta, Dr. Samidha , D. Sharma - ”Clustering-Classification Based Prediction of Stock
Market Future Prediction” - (IJCSIT) International Journal of Computer Science and Information
Technologies, Vol. 5 (3) , 2014, 2806-2809.
[2] Dharamveer , Beerendra , Jitendra Kumar –“ Efficient Prediction of Close Value using Genetic
algorithm based horizontal partition decision tree in Stock Market “ Volume 2, Issue 1, January 2014
International Journal of Advance Research in Computer Science and Management Studies Research
Paper Available online at: www.ijarcsms.com.
[3] Kannan, K. Senthamarai, et al. "Financial stock market forecast using data mining techniques."
Proceedings of the International Multiconference of Engineers and computer scientists. Vol. 1. 2010.
[4] Md. Al Mehedi Hasan, Mohammed Nasser, Biprodip Pal, Shamim Ahmad – “Support Vector
Machine and Random Forest Modeling for Intrusion Detection System (IDS)”- Journal of Intelligent
Learning Systems and Applications, Vol.6 No.1(2014), Article ID:42869,8 pages
[4] Shen, S., Jiang, H., & Zhang, T. (2012). Stock market forecasting using machine learning algorithms.
[5] Bayaga, Anass. "Multinomial logistic regression: usage and application in risk analysis." Journal of
applied quantitative methods 5.2 (2010): 288-297.
[6] Shah, Vatsal H. "Machine learning techniques for stock prediction." Foundations of Machine
Learning| Spring (2007).
[7] Al-Radaideh, Qasem A., Aa Assaf, And Eman Alnagi. "Predicting Stock Prices Using Data Mining
Techniques." The International Arab Conference on Information Technology (ACIT’2013). 2013.
[8] Wang, Jar-Long, and Shu-Hui Chan. "Stock market trading ruleDiscovery using two-layer bias
decision tree." Expert Systems withApplications 30.4, 605-611, 2006.
[9] Wu, Muh-Cherng, Sheng-Yu Lin, and Chia-Hsin Lin. "An effectiveApplication of decision tree to
stock trading." Expert Systems withApplications 31.2: 270-274, 2006.
[10] Mahdi Pakdaman Naeini, Hamidreza Taremian, Homa Baradaran Hashemi “Stock Market Value
Prediction Using Neural Networks”, International Conference on Computer Information Systems and
Industrial Management Applications (CISIM), pp. 132-136, 2010.
[11] Matsui, Kazuhiro, and Haruo Sato. "Neighbourhood evaluation in acquiring stock trading strategy
using genetic algorithms." Soft Computing and Pattern Recognition (SoCPaR), 2010 International
Conference of. IEEE, 2010.
[12] Gupta, Aditya, and Bhuwan Dhingra. "Stock market prediction using hidden markov models."
Engineering and Systems (SCES) Students Conference on, IEEE, 2012.
8. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.6, November 2015
52
[13] Lin, Yuling, Haixiang Guo, and Jinglu Hu,"An SVM-based approach for stock market trend
prediction." Neural Networks (IJCNN), the 2013 International Joint Conference on. IEEE, 2013.
[14] Sanjana Sahayaraj, Shomona Gracia Jacob, Data Mining to Help Aphasic Quadriplegic and Coma
Patients, International Journal of Science and Research (IJSR), Vol. 3(9), pp.121-125, 2014.
[15] Jacob SG, Ramani RG, Prediction of Rescue Mutants to predict Functional Activity of Tumor
Protein TP53 through Data Mining Technniques”, Journal of Scientific and Industrial Research,
Vol.74, pp.135-140, 2015.
[16] Zhao, Yanchang. R and data mining: Examples and case studies. Academic Press, 2012.
[17] Ramani, R. Geetha, and Shomona Gracia Jacob. "Improved classification of lung cancer tumors
Based on structural and physicochemical properties of proteins using data mining models." (2013):
e58772.
[18] Jacob, S. G., and R. G. Ramani. "Prediction of Rescue Mutants to Restore Functional Activity of
Tumor Protein TP53 through Data Mining Techniques." Journal of Scientific & Industrial Research
74.3 (2015): 135-140.
AUTHORS
Shashaank D.S is currently pursuing B.E computer Science and Engineering in SSN
College of Engineering Chennai, India. He is doing research in the field of machine
learning and is interested in speech processing.
Sruthi.Vis currently pursuing B.E computer Science and Engineering in SSN College of
Engineering Chennai, India. She is doing research in the field of machine learning.
Vijayalakshimi M.L.S is currently pursuing B.E computer Science and Engineering in
SSN College of Engineering Chennai, India. She is doing research in the field of
machine learning.
Dr. Shomona Gracia Jacob is Associate Professor, Department of CSE, SSN College
of Engineering, Chennai, India. She completed Ph.D at Anna University in the area of
Biological and Clinical Data Mining. She has more than 30 publications in
International Conferences and Journals to her credit. Her areas of interest include
Data Mining, Bioinformatics, Machine Learning, and Artificial Intelligence .She has
reviewed many research articles on invitation from highly reputed refereed journals.
She is currently guiding under-graduate and post-graduate projects in the field of data
mining and intelligent systems.