• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
A Predictive Model Construction Applying Rough Set ...
 

A Predictive Model Construction Applying Rough Set ...

on

  • 710 views

 

Statistics

Views

Total Views
710
Views on SlideShare
710
Embed Views
0

Actions

Likes
1
Downloads
19
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    A Predictive Model Construction Applying Rough Set ... A Predictive Model Construction Applying Rough Set ... Document Transcript

    • International Research Journal of Finance and Economics ISSN 1450-2887 Issue 30 (2009) © EuroJournals Publishing, Inc. 2009 http://www.eurojournals.com/finance.htm A Predictive Model Construction Applying Rough Set Methodology for Malaysian Stock Market Returns Saiful Hafizah Jaaman Centre for Modelling and Data Analysis, School of Mathematical Sciences Faculty of Science & Technology, Universiti Kebangsaan Malaysia 43600 UKM Bangi, Selangor, Malaysia E-mail: shj@ukm.my Tel: 603-89213422; Fax: 603-89254519 Siti Mariyam Shamsuddin Faculty of Computer Science & Information System Universiti Teknologi Malaysia, Skudai, Johor, Malaysia E-mail: mariyam@utm.my Bariah Yusob Faculty of Computer System & Software Engineering Universiti Malaysia Pahang, Kuantan, Pahang, Malaysia Munira Ismail Centre for Modelling and Data Analysis, School of Mathematical Sciences Faculty of Science & Technology, Universiti Kebangsaan Malaysia 43600 UKM Bangi, Selangor, Malaysia E-mail: munira@ukm.my Tel: 603-89215723; Fax: 603-89254519 Abstract This paper describes the invention about the stock market prediction for use of investors. More specifically, the stock market’s movements are analyzed and predicted in order to retrieve knowledge that could guide investors on when to buy and sell. Through a case study on trading Kuala Lumpur Composite Index and individual firms listed in Bursa Malaysia, rough sets is shown to be an applicable and effective tool for stock market analysis. The ability of rough set approach to discover dependencies in data while eliminating superfluous factors in noisy stock market data deems very useful to extract trading rules. This is very crucial to detect market timing for market timing is detected by capturing the major turning points in data. Nevertheless, one failure of the predictive system developed in this research is its inability to detect numerous minor trends displayed by volatile individual firms, thus the failure to produce effective trading signals to generate profits above the naive strategy for these firms. Keywords: Rough Set Theory, Market Movement, Stock Returns, Technical Analysis
    • International Research Journal of Finance and Economics - Issue 30 (2009) 212 1. Introduction In major financial markets around the world, trading in the stock market has gained extraordinary popularity as a way of life to reap huge profits. However, the prediction of stock price movement poses such a challenge to academicians and practitioners because of the complexity of the stock market data. Though modeling the behavior of a stock movement is a challenging task, many successful investors today know that robust predictive modeling can assist in better identifying and segmenting high performance securities, which can lead to superior investment decisions. Predictive modeling can help investors strategize their investment funds smarter. Investors no longer have to base their investment decisions totally on their “gut feelings” but can use factual data to assist in making better investment judgments. Predictive modeling is a form of data mining. Data mining is a computational intelligence discipline that contributes tools for data analysis, discovery of new knowledge, and autonomous decision making. The task of processing large volume of data has accelerated the interest in this field. As mentioned in Mosley (2005) data mining is the analysis of observational datasets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner. Predictive modeling takes these relationships and uses them to make inferences about the future. One approach for data mining is to use rough sets (Pawlak 1982, 1991). Rough set can be used to analyze incomplete or uncertain information. The rough set theory is normally used for reduction of data sets, finding hidden data patterns and generation of decision rules. In application, rough set techniques are often applied to stored data to produce a set of rules that can be used to predict values. The problem studied in this research is about the movement of the stock market and selected individual stock prices for investor’s usage. More specifically, the stock market’s movement and individual stock prices will be analyzed to retrieve knowledge that may guide investors on when to buy and sell using rough set as the analyzing tool. 2. Rough set approach for data mining Rough sets theory was introduced by Pawlak in 1982. It was developed based on mathematical tool to deal with vagueness and uncertainty in the classification of objects in a set (Pawlak et al., 1995). The rough set philosophy is founded on the assumption that there is some information regarding features which can be associated with every object of the universe. In rough sets, the data is organized in a table called decision table, which are flat tables containing attributes as columns and data elements as rows. The class label is called as decision attribute, the rest of the attributes are the condition attributes. For rows, rough set theory employs the notion of indiscernible class to group similar tuples together; while for columns, it employs the notion of indiscernible attribute to identify significant attributes. The key idea of rough set approach lies in the analysis of limits of discernibility for a subset of objects belonging to the domain. Rough set theory defines three regions based on the equivalent classes induces by the attribute values: lower approximation, upper approximation, and boundary. Objects characterized by the same features are indiscernible in view of the information available about the objects. The indiscernibility relation generated in this way is an equivalence relation. Any set of all indiscernibility objects is called elementary set. Any union of elementary set is called a definable set or crisp set; otherwise a set is rough or imprecise. Thus, every rough set has a boundary that differentiates objects which cannot be classified with certainty to be an element of the rough set or of its complement. Therefore, a rough set can be replaced by a pair of crisp set, called the lower and the upper approximation. The lower approximation consists of all objects which definitely belong to the set and the upper approximation contains all objects which probably belong to the set while the boundary is the difference between the upper approximation and the lower approximation. Based on the concept of indiscernibility relation, redundant features can be identified and eliminated to reduce the number of features. Thus, rough sets theory is suitable for data reduction and
    • 213 International Research Journal of Finance and Economics - Issue 30 (2009) valuable as a preprocessing tool (Kusiak, 2001). Besides data reduction, rough sets theory is also suitable for the problems of features dependencies to evaluate the significance of features, deal with data uncertainty and vagueness, discover cause-effect relationships, generate decision algorithms from data, and approximate classification of data (Pawlak et al., 1995; Dimitras et al., 1999). The main advantage of rough set is that it does not need any preliminary or additional information about data, such as probability distribution in statistics, basic probability assignment in the Dempster-Shafer theory, or grade of membership or the value of possibility in fuzzy set theory (Pawlak et al. 1995). The process of rough set concepts can be divided into 5 steps; construct information tables consisting of conditions and decision attributes, identify the indiscernibility relations, find the reducts, generate rules and finally classify. An information table which is in the form of rows and columns represent the input data which are to be studied. After the information table has been constructed, the set of indiscernibility relation are derived using objects with the set of features. The concepts of upper and lower approximations are used to deal with inconsistent objects that probably or definitely belong to the set. The main issue in rough set theory is to find the smallest subset of features without losing any information. The minimal subset of features is called reducts. The reducts in rough set theory is sets that contain the same quality of sorting as the whole original set of features but posses the least features. The reducts can be computed using the discernibility matrix. If there are more than one reducts, then the intersection of the sets of all reducts is called the core of the dataset. The core is a collection of the most significant or important features, it may sometimes be empty. From the reducts, production rules to classify the objects are generated. The rules are logical statements of the type “IF conjunction of condition features THEN disjunction of decision features” which are induced from the reduced set of condition and decision attributes. The decision rules can be measured by support, length, coverage and accuracy. Stronger rule covers more objects. A rule that is unique (and of a particular strength) allows for data inconsistency and is able to deal with inconsistency in a very natural way. The computation involved in the lower approximation will produce certain rules while the computation involved in the upper approximation will produce possible rules. The rule support is defined as the number records in the sample that fully exhibit the property described by the IF-THEN condition. The length is defined as the number of conditional elements in the IF part. The coverage is defined as the fraction of records in the sample that are identified by the IF or THEN parts. The accuracy measures the trustworthiness of the rule in the THEN parts. The rules are said to be complete if any object belonging to the class that is covered by the description coverage is equal to 1 while deterministic rules are rules with the accuracy equal to 1. The correct rules are rules with both coverage and accuracy equal to 1. Detailed discussion of rough set theory is provided in Walczak and Massart (1999) and Pawlak et al. (1995). 3. Trading System Based on Rough Sets In economic forecasting, the most widely used methods are the fundamental and technical analyses. Fundamental analysis is a complex stock market prediction method involving in-depth analysis of the firm’s annual report and indicators of the general economy. This method assumes that current share (and future) price depends on its intrinsic value and anticipated return on investment. Though fundamental analysis assumes that new information about a firm will affect the movement of its share price, this analysis is difficult to implement for it requires real and reliable information of a firm such as economic conditions, financial reports and company’s competitive strength. On the other hand, technical analysis only considers the actual history of trading and price in a security or index. The underlying theory is based on an assumption that the market price reflects all known information about the individual security. In predicting the stock market movement, technical analysis has been the common approach employs, approximately 90% of major stock traders use this method in their investment analysis. Technical analysis is mostly concerned with market indicators. These technical indicators look at the
    • International Research Journal of Finance and Economics - Issue 30 (2009) 214 trend of price indices and individual securities evaluating the current position of the security or index. The theory underlying these indicators is that once a trend is in motion, it will continue in that direction (Achelis 1995). Technical indicators such as the moving average, trading bands, Bollinger bands, volume, Moving Average Convergence / Divergence (MACD), Relative Strength Index (RSI) and others have been widely used to analyze the trend of market direction via chart presentations. Technical analysis attempts to determine the strength of the trend and the direction of the trend. In this study, the approach is a form of technical analysis. We suppose that the stock market is only weak form Efficient Market Hypothesis that the price contains enough history information (Fama 1991). This assumption makes it possible to infer future price only from historical data. The trading system based on rough set is illustrated in Figure 1. Figure 1: Process of stock market data prediction and analysis Based on historical data generate indicators Choose training data set and validation set Put into Rough Set Model Extract Trading Rules Implement in the real market Extracting trading rules from the stock market is alternative way of detecting market timing. Market timing is an investment strategy which is used for the purpose of obtaining excess return. Detecting market timing means determining when to buy and sale to get excess return from trading. Market timing system usually employs profitable rule base to capture the major turning points. 4. Research Data and Methodology The research data used in this study is the daily close price of the Kuala Lumpur Composite Index and 80 individual firms from various sectors listed in Bursa Malaysia from year 2000 to 2007. These data sets are divided into a rule-extraction set which covers the period from 2000 to 2005 and a validation set which covers the period from 2006 to 2007. The stock market close price time series database is composed of daily random fluctuations as well as long term trend. In order to produce cleaner data with as little additive noise as possible, the raw data needs to be denoised. It is assumed that the raw data araw(n) are composed of long trend signal a(n) and noise e(n) with additive nature as shown in Equation 1: araw(n) = a(n) + e(n) (1) The cleaning process will produce â(n) to estimate the long-term signal a(n). The long term signal is stable and deterministic in contrast, noise signal is of random nature influenced by various sources. To clean the raw data, moving average (MA) is used to trend and smooth out close price fluctuation (or “noise”) that could confuse interpretation. MA is the average value of a security’s close
    • 215 International Research Journal of Finance and Economics - Issue 30 (2009) price over a period of time. To calculate MA for specified N days, close price of the past N days is sum up and divide by N as presented below. N MA = ∑ (closeprice)i / N (2) i =1 The selling or buying signal of any particular security is triggered by the stock’s trend which is determined by the length (duration) of the stock’s trend. Extremums, tx in the cleaned close price data are found using equation 3 (differentiation of cleaned data and time). d â(t) dt | t = t x = 0 (3) Extremum, tx is ordered in a set Te={ t0, … tNe} and it would divide the time series into Ne patterns or intervals. The length of each extracted intervals is calculated. Intervals with short length (less than threshold d) are eliminated by using linear interpolation. The threshold, d is selected to be 5 days, which is the weekly trading period in Bursa Malaysia. The extracted extremums in pattern dividing represent the intervals of different trends. The length of the intervals is the most important feature and is the target attribute in this system. The length of the interval, Te={ t0, … tNe} represents the length of the quotation and is calculated using the formula below. Length = ti + 1 − ti 0≤i≤Ne–1 (4) The slope of the patterns is then calculated where for each interval, the initial and final values of â(n) are used to calculate the slope as given below. − αi = â(ti t+ 1) − t â(ti) (5) i +1 i Another important feature is the Signal-Noise-Ratio (SNR) which expresses the fluctuations of the series data. A high SNR value indicates that the series is unstable and influenced by various parameters and different factors. Low values of SNR indicate a stable series, influenced by a limited number of factors. SNR is calculated using equation (6) and (7). i +1 ε 2 (t ) ∫ â 2 (t ) SNR = i (6) t i +1 − t i i ε (t ) = a(t ) − â(t ) (7) a(t) is the original data and â (t ) is the cleaned data. In dealing with signal measured at discrete time interval, Equation (6) is approximated by the discrete analog of the integral as shown in Equation (8). 1 m −1 ε 2 (i ) SNRi = ∑ m i =0 â 2 (i ) (8) m = t i +1 − t i (9) The slope, signal-noise-ratio (SNR) and length are the major attributes in this predictive system. The rough set approach needs the data which are discretized in advance because the rule extraction process does not have the discretization algorithm. Thus, real value attributes in decision table is discretized to achieve a higher quality of classification. In order to express the knowledge in the rules, attributes in the decision table are represented in discrete form. One of the advantages of using discrete values is that it is more concise therefore it represents and specifies the attribute better, easier to use and comprehend which can lead to improved prediction accuracy. There are numerous discretization methods mentioned in the literature. This study employs the boolean reasoning algorithm, equal frequency binning and chi2 algorithm. Training set is used for the model development while testing set is adopted for evaluating the forecasting ability of the model. Thus, objects in the decision table (both transformed and original) are divided into two set of objects.
    • International Research Journal of Finance and Economics - Issue 30 (2009) 216 5. Experimental Results As mentioned earlier, in this study, the rough set approach is used to find the profitable rules. Assessing the performance of a stock market prediction system is not an easy task, as mentioned in Armano et. al (2002) percentage of classification accuracy do not have direct economic meaning. Besides, evaluation of any proposed model depends on investor’s strategy, for different strategy may generate different profit although the same underlying model is applied. The predictive system of this study determines the profitable rules for trading of when to buy, hold and sell by detecting the duration (length) of a certain trend for both the KLCI and individual stocks. Long interval or Length is an indication of a stable market. In this system, the decision value of the classifier with shortest Length or “[*, 11)” indicate an unstable market trend while decision value with longest Length or “[17,*)” means the market is in a stable trend. Using this information, the trading strategy is constructed as below: If decision=“[17,*)”, then buy (long or short depend on the Slope) If decision=“[*,11)”, then sell If decision=“[11,17)”, hold or no trading If no decision from classifier, sell or no trading In this simulation, initial seed money of RM1000 is used for investment on KLCI and on the individual stocks. To simplify the calculations, transaction costs are not taken into account. The aim is to make as much profit as possible with the aid of the predictive system. Table 1 below presents the earnings for an example of the simulation using KLCI data which include the training and testing data set. Table 1: Trade example using KLCI <<Trade 1>> <<Trade 4>> Balance: 1000 Balance: 1117.431 Going Short at 629.1. Going Long at 744.99 Sell at 629.66. Sell at 736.16. Gain/Loss (+/-): -.89 Gain/Loss (+/-): -13.24 <<Trade 2>> <<Trade 5>> Balance: 999.1098 Balance: 1104.187 Going Long at 630.84. Going Long at 742.73. Sell at 681.69. Sell at 800.96. Gain/Loss (+/-): 80.54 Gain/Loss (+/-): 86.57 <<Trade 3>> <<Trade 6>> Balance: 1079.645 Balance: 1190.755 Going Long at 691.45. Going Short at 791.49. Sell at 715.65. Sell at 790.56. Gain/Loss (+/-): 37.79 Gain/Loss (+/-): 1.4 The highest annual profit using rules generated from KLCI data sample is shown in Table 2. As shown, rules generated from data sample generalized by Matlab ANN and discretized χ 2 algorithm gives the highest annual profit at 74.33%. Profit generated by rules set generated from discretized Boolean reasoning and generalized by developed ANN is the lowest. Table 2: Performance of Network Boolean Reasoning EFB (4 bin) Chi-square 29.25% 50.29% 74.33% Matlab ANN (rules set from dynamic (rules set from Johnson’s (rules set from Holte’s IR) exhaustive calculation) algorithm) 18.78% 61.36% 68.94% Developed ANN (rules set from dynamic (rules set from Genetic (rules set from Holte’ IR) exhaustive calculation) algorithm)
    • 217 International Research Journal of Finance and Economics - Issue 30 (2009) Findings of this study show that rough set approach is best used to produce trading rules for indices and individual companies with deterministic long-term market movements. Findings show that rough set approach is not able to effectively produce trading signals superior than the naive strategy for volatile individual stocks where the market trends are uncertain. In rough set the selection of reducts and the extraction of rules are controlled by the strength of each reduct and rule which is determined by capturing the major turning points in data. This is very important to detect market timing. However, individual small firms with volatile market movements do not display these major turning points in data. 6. Conclusion This study intends to mine profitable trading rules using the rough set approach for Kuala Lumpur Composite Index and eighty individual firms listed in Bursa Malaysia. The rough set approach is very helpful to extract trading rules because it can be used to discover dependencies in data while eliminating the superfluous factors in noisy stock market data. In rough set, the selection of reduct and the extraction of rules are controlled by the strength of each reduct and rule. This is very important in order to detect market timing, for market timing is detected by capturing the major turning points in data. The experimental results of this study are very encouraging and prove the usefulness of the rough set approach for stock market analysis with respect to profitability. Nevertheless, one failure of the predictive system developed in this research is its inability to detect numerous minor trends displayed by volatile individual firms selected in this study, thus the failure to produce the trading signals to generate profits for these firms. Thus, the need for an extended study in using rough set methodology, extensive study on the validation process because the future is never exactly like the past and a comparative study about the effects on results using other artificial intelligence and statistical tools. Finally, investing in a particular stock depends on a number of attributes and the level of risk an investor is comfortable with depends on these attributes. Acknowledgement This work is supported by Universiti Kebangsaan Malaysia’s Research University Grant(Code: UKM- GUP-TMK-07-02-107
    • International Research Journal of Finance and Economics - Issue 30 (2009) 218 References [1] Achelis, S. B., 1995. Technical Analysis from A to Z. Probus Publishing. [2] Armano, G., Murru, A., & Roli, F. 2002. “Stock Market Prediction by a Mixture of Genetic- Neural Experts”. International Journal of Pattern Recognition and Artificial Intelligence. Vol. 16, No. 5: 501-526. [3] Dimitras, A., Slowinski, R., Susmaga, R. & Zopounidis, C. 1999. “Business Failure Prediction Using Rough Sets”. European Journal of Operational Reseach. Vol. 114: 263-280. [4] Fama, E.F., 1991. “Efficient capital markets”. Journal of Finance. Vol. 46, No. 5: 1575-1617. [5] Kusiak A. 2001. “Rough Set Theory: A Data Mining Tool for Semiconductor Manufacturing”. IEEE Transactions on Electronic Packaging Manufacturing. 24 (1). [6] Mosley, R. 2005. The Use of Predictive Modeling in the Insurance Industry. January. Pinnacle Actuarial Resources Inc. [7] Pawlak, Z. 1982. “Rough Sets”. International Journal of Information and Computer Sciences. Vol.11:341-356. [8] Pawlak, Z. 1991. Rough Sets: Theoretical Aspects and Reasoning about Data. Kluwer Academic Publishers. [9] Pawlak, Z., Grzymala-Bussr, J., Slowinski, R. & Ziarko, W. 1995. “Rough Set”. Communication of the ACM. November. Vol.38. No.11. [10] Walczak B. & Massart D. L. 1999. “Tutorial Rough Sets Theory”. Chemometrics and Intelligent Laboratory Systems. 47: 1 – 16.