The document describes a proposed approach for efficiently mining frequent item sets from stock market data using an association rule mining technique called the Apriori algorithm. The approach involves preprocessing the data to reduce the overall mining time and using pruning techniques to eliminate item sets that do not satisfy inter-transaction criteria. It constructs an FP-tree to represent the frequent patterns in the data using two database scans, then applies an FP-Growth algorithm with pruning to discover inter-transaction association rules between stock prices and other variables. The goal is to provide deeper analysis of stock price movements over time to help financial analysts and investors.
2. Efficient Mining of Fast Frequent Item Set Discovery from Stock Market Data
http://www.iaeme.com/IJCET/index.asp 21 editor@iaeme.com
frequency (support) of item-set is the no. of transactions containing all the items of
that item-set. The frequent item-set is one whose support is greater than or equal to
some predefined parameter minimum support (min_sup). After frequent item-sets are
found, association rules are developed from them by considering one more parameter
minimum confidence (min_conf). Temporal data mining provides some additional
capabilities required in cases where the evolution of the existing data and their
interactions need to be observed through the time dimension [2]. Temporal data
mining can be defined as the activity of looking for interesting correlations or patterns
in large sets of temporal data accumulated for other purposes [3]. Main aim of a stock
market is, dealing of stocks between investors. Stocks are grouped into industry
according to their primary business focus [4]. Each stock is not only characterized by
its price but also by many others variable. Dealing with all these variables and
determine the result its required deeper kind of study. So, it could show the behavior
of a stock over time. The main variables are shown in the table below [5][6].
Table 1 Stock Variables
Variable Description
Price Current price of a stock
Opening Price Opening price of a stock on specific trading day
Closing Price Closing price of a stock on specific trading day
Volume Transactions Volume (buy / sell)
Change Percentile Opening and Closing stock value difference
Change (U) Percentile opening and closing stock value difference
1.1 Temporal Data Mining
Temporal data mining is concerned with data mining of large sequential data sets. By
sequential data, it means data that is ordered with respect to some index. It provides
some additional capabilities required in cases where the evaluation of the existing data
and their interactions need to be observed through the time dimension [7].
1.2 Time Series
Order sequence of data point is a time Series. It is measured at successive times
spaced at uniform time intervals. A large amount of data is collected everyday in the
form of event time sequences. These sequences represent valuable sources of
information not only to search for a particular value or event at a specific time, but
also to analyze the frequency of certain events, discover their regularity, or discover
set of events related by particular temporal relationships [9].
2. THE PRINCIPLE OF THE APRIORI ALGORITHM.
Apriori method is the common approaches to mining frequent patterns. In apriori
algorithm support is the fraction of entities which consumed the itemsets in any of
their possible transaction. After identifying the large itemsets the itemsets with
support greater than the minimum support allowed, they are translated to an integer
and each sequence is transformed in a new sequence whose elements are the large
itemsets of the previous-one. The next step is to find the large sequences. For achieve
this, the algorithm acts iteratively as apriori: first it generates the candidate sequences
and then it chooses the large sequences from the candidate ones, until there are no
candidates. An effective algorithm to discover association rules is the apriori.
3. Hitesh R. Raval and Dr. Vikram Kaushik
http://www.iaeme.com/IJCET/index.asp 22 editor@iaeme.com
Adapting this method to deal with temporal information leads to some different
approaches. Common sub-sequences can be used to derive association rules with
predictive value, as is done, for instance, in the analysis of multi-dimensional time
series [9][14].
A possible approach consists on extending the notion of a typical rule X → Y
(which states if X occurs then Y occurs) to be a rule with a new meaning: X →T
Y
(which states: if X occurs then Y will occur within time T). In order to discover these
rules, it is necessary to search for them in a restrict portion of time, since they may
occur repeatedly at specific time instants but on a little portion of the global time
considered. A method to discover such rules is applying an algorithm similar to the
apriori, and after having the set of traditional rules, detects the cycles behind the rules
[14].
Definition 1: The support of an item (or set of items) is the numbers of transactions in
which that item (or items) occur. Given a set of transactions in a database where each
letter corresponds to a certain product such as Jeans or T-shirt and each transaction
corresponds to a customer buying the products A, B, C or D the first step in the
apriori algorithm is to count the support (number of occurrences) of each item
separately.
Table 2 Transaction Items
Transaction Items
T1 A, B, C, D
T2 B, C, D
T3 B, C
T4 A, B, D
T5 A, B, C, D
T6 B, D
Table 3 Number of Transactions for Items
Item Support
A 3
B 6
C 4
D 5
The items in the transactions represented in Table 2 have their support represented
in Table 3.
Definition 2: The support threshold is defined by the user and is a number for which
the support for each item (or items) has to be equal or above for the support
threshold to be fulfilled [13, 14]
In this example we will use support threshold = 3. This means all items in Table 3
meets this condition since none of them have a support below 2 as seen in Table 3.
Definition 3: Given a set of items I = {I1 I2 … In} an item set is a subset of I [14].
Definition 4: A large item set is an item set whose numbers of occurrences in the
transactions is above the support threshold. Here use the notation L to indicate the
complete set of large item sets [14].
4. Efficient Mining of Fast Frequent Item Set Discovery from Stock Market Data
http://www.iaeme.com/IJCET/index.asp 23 editor@iaeme.com
In example the complete set of large itemset L in this first iteration is L = {A, B,
C, D} since all of these terms meets the support threshold. If any of these items had
been below the support threshold they had not been included in the subsequent steps.
In the next steps will form all pairs, triples and so on of the items in Table 3. If A
would have a support threshold below three all pairs, triples etc containing A would
also be below the support threshold. This is the fundamental basis of the apriori
algorithm since it allows us to prune all transactions having only items under the
support threshold, hence reducing the amount of data in each step.
The next step is to form all 2-pair item sets. Can do this by making all possible
combinations of the large item sets without regarding the order.
Table 4 Transaction Items Together
Item Support
A,B 3
A,C 2
A,D 3
B,C 4
B,D 5
C,D 3
Table 5 Large Itemsets
Large Itemsets
A,B
A,D
B,C
B,D
C,D
In Table 4 the new itemsets are illustrated together with respective support. The
item set A, C only have support 2 and since our support threshold is 3 the item set is
not a large item set. Next generate the 3-sets by joining the full set of large item sets
in Table 5 over a common item.
Table 6 Combination of Three sets
Item Support
A,B,C 2
A,C,D 2
A,B,D 3
B,C,D 3
Table 7 Large Itemsets in Three set Combation
Large Itemsets
A,B,C
B,C,D
5. Hitesh R. Raval and Dr. Vikram Kaushik
http://www.iaeme.com/IJCET/index.asp 24 editor@iaeme.com
The only 3-set that fulfills the support threshold is {A, B, D} and {B, C, D} as
illustrated in Table 7. If continue this process by joining the item sets in the complete
large item set over a common pair user get the last possible combination.
Table 8 Four set Combination
Item Support
A,B,C,D 2
Figure 1 Illustration of the possible combinations of the A, B, C, D without regarding the
order in the apriori algorithm.
The process of joining terms in the apriori algorithm is illustrated in Figure 1.
Note that the position of item in the itemsets doesn’t matter. i.e. the item set {A, B,
D} is regarded in the same way as {D, A, B} and to keep track of this don’t get any
redundancies later in the implementation all items in each item set is ordered by its
value.
The apriori algorithm cuts some of the branches in the tree in Figure 1, for
example the item set {A,C} did only occur 2 times which was below the support
threshold at 3. The apriori algorithm makes use of this by not generating any branches
from this node and thus reduces the computational cost. This is as said the foundation
of the apriori algorithm [9].
2. 1. Algorithm 1. Apriori algorithm [15]
Input:
I //Itemsets
D //Transactions
S //support threshold
Output:
L // large itemsets
Apriori algorithm
k = 0 // k is used as the scan number
L = Ø
6. Efficient Mining of Fast Frequent Item Set Discovery from Stock Market Data
http://www.iaeme.com/IJCET/index.asp 25 editor@iaeme.com
C1 = I //Initial candidates are set to be the items
Repeat
k = k + 1
Lk = Ø
for each Ii € Ck do
ci = 0 //Initial counts for each itemset are 0
for each tj € D do
for each Ii € Ck do
if Ii € tj then
ci = ci + 1
for each Ii € Ck do
if ci ≥ s do
Lk = Lk U Ii
L = L U Lk
Ck+1 = aPriori-Gen (Lk)
2.2. Algorithm 2. Apriori-Gen algorithm [14]
Input:
Li-1 //Large itemsets of size i-1
Output:
Ci //Candidates of size i
Apriori-Gen algorithm
Ci = Ø
for each I € Li-1
for each J ≠ I € Li-1 do
if i-2 of the elements in
I and J are equal
then
Ck = Ck U {I U J}.
2.1 Impact of the Algorithm
Past few years, numbers of studies have been published new algorithms or
improvements on existing algorithm to solve frequent itemset mining. Some
algorithms require a small amount of memory, but heavy disk access (such as Apriori-
like algorithms); others necessitate low I/O activity, but large amount of memory such
7. Hitesh R. Raval and Dr. Vikram Kaushik
http://www.iaeme.com/IJCET/index.asp 26 editor@iaeme.com
as FP-growth. However, the number of research papers on the inter-transaction
mining problem is still few since it is a more challenging problem than intra-
transaction mining.
3. PROPOSED WORK
Luhr et al. [10] and Tung et al. [11] proposed a framework that can only discover
inter-transaction association rules, whereas proposed an approach to mine quantitative
intra-transaction association rules. In order to discover quantitative inter-transaction
association rules, a new approach is developed to extract rules from single
dimensional transaction datasets of stock closing price and traded volume.
Investigator proposed framework called ITARM for mining inter-transaction
association rules for the dataset that contains constant number of items in each
transaction. Investigator approach not only predicts the movement of stock price in
either direction with user defined minimum support and confidence value but also
predicts the probable variation in stock closing price and traded quantity based on
historical data of attributes.
The proposed approach employs an effective preprocessing phase as shown in
Figure 2 that reduces over all mining time and efficient pruning techniques to
eliminate frequent itemsets which are occurring in same transaction and also which
are not started with first transaction of each sliding window. ITARM uses FP-tree
based algorithm because it requires only two scan of transaction database to construct
a tree and uses prefix-tree structure which requires less memory.
Figure 2 Process of mining inter-transaction association rules (ITARM)
4. THE PROPOSED ALGORITHM (FP-TREE CONSTRUCTION
AND PRUNING)
Input: A transaction database D and a minimum support threshold minsup
Output: Its frequent pattern tree, FP-tree
Method: The FP-tree is constructed in the following steps
1. Scan the transaction database D once. Collect the set of frequent items F and their
supports. Sort F in support descending order as L, the list of frequent items.
2. Create the root of an FP-tree, T, and label it as “null”. For each transaction Trans in
D do the following.
Select and sort the frequent items in Trans according to the order of L. Let the
sorted frequent item list in Trans be [p | P], where p is the first element and P is the
remaining list. Call insert tree ([p | P], T).
The function insert tree ([p | P], T) is performed as follows.
8. Efficient Mining of Fast Frequent Item Set Discovery from Stock Market Data
http://www.iaeme.com/IJCET/index.asp 27 editor@iaeme.com
If T has a child N such that N.item-name = p.item-name, then increment N’s count
by 1; else create a new node N, and let its count be 1, its parent link be linked to T,
and its node-link be linked to the nodes with the same item-name via the node-link
structure. If P is nonempty, call insert tree (P, N) recursively.
3. Sort (P, size of p). If sorted frequent items has first digit 1 than store itemset in hash
table else skip itemset because it is not satisfying inter-transaction criteria.
After tree construction over, sort frequent mega-transaction itemsets on their sub
window number. If each sorted frequent itemsets has sub window number 1 than store
itemset in hash table else skip itemset because it is not satisfying inter-transaction
criteria. Step 3 shows modification to current FP-tree algorithm to incorporate
frequent inter-transaction mining.
Algorithm (FP-growth: Mining frequent inter-transaction patterns with FP-tree by
pattern fragment growth and pruning)
Input: Constructed FP-tree, using transaction database D and a minimum support
threshold minsup
Output: The complete set of inter-transaction frequent patterns
Method: Call FP-growth (FP-tree, null)
Procedure FP-growth (Tree, α)
• if Tree contains a single path P
• then for each combination (denoted as β) of the nodes in the path P do
• generate pattern β ∪ α with support = minimum support of nodes in β;
• else for each ai in the header of Tree do {
• generate pattern β = ai ∪ α with support = ai.support;
• construct β ’s conditional pattern base and then β ’s conditional FP-tree Treeβ;
• if Treeβ ≠ Ø
• then call FP-growth (Treeβ, β)
• if Treeβ has all items starting with ‘1’
• then it confirms intra-transaction rule, Prune such instances else keep
5. RELATED WORK
While the amount of data increases gradually, the frequent itemsets of inter-
transaction association rule will become larger and larger and hard to handle [12]. At
least there is no algorithm available that deals with quantitative inter-transaction
association rules. Unfortunately very less work has been done to discover this kind of
rules in financial domain and this area is still emerging.
6. CONCLUSION
Frequent itemset mining and association rule mining are the two important tasks of
data mining. Incorporating utility considerations in data mining tasks is gaining
popularity in recent years. In this paper, we are trying to discovering itemsets under
all time windows of data streams can be achieved effectively with limited memory
space, less candidate itemsets and CPU I/O. This meets the critical requirements of
time and space efficiency for mining data streams. In this paper we have taken two
most influencing factors - closing price and traded volume of the National Stock
Exchange (NSE) India, as company stock price is following movement of higher
index based companies for reasonable higher support and confidence. Stock closing
9. Hitesh R. Raval and Dr. Vikram Kaushik
http://www.iaeme.com/IJCET/index.asp 28 editor@iaeme.com
price, traded volume, business sector, earning numbers, Price Earnings ratio, rumors
etc. are some of the factors that influence the stock price. Obviously, all these factors
cannot be easily modeled and embedded, since some of them are related with human
psychology. Enhanced mechanism that provides better trade-off between main
memory and interactive inter-transaction rule. We have considered set boundary for
symbolic representation, applying fuzzy logic can provide more accuracy with higher
computation complexity.
REFERENCES
[1] Han, J. and Kambler, M. Data Mining Concepts & Techniques.
[2] Agrawal, R., Imielinski, T. and Swai, A. Mining association rules between sets of
items in large databases. In Proceedings of 1993 ACM S IGMOD Intl Conf. on
Management of Data, Washington, D. C., May 1993. pp 207--216
[3] Bettini, C., Wang, X. S. and Jajodia, S. Testing complex temporal relationships
involving multiple granularities its application to data mining. In Proc of the 15th
ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database
Systems, June 3–5, 1996, Montreal, Canada, ACM Press, 1996, pp 68–78.
[4] Ayn, N. F., Tansel, A. U. and Arun, E. An efficient algorithm to update large
itemsets with early pruning Proceedings of the Fifth AC M SIGKDD
International Conference on Knowledge Discovery and Data Mining San Diego,
August 1999.
[5] Marketos, G., Pediaditakis, K., Theodoridis, Y. and Theodoulidis, B. Intelligent
Stock Market Assistant Using Temporal Data Mining.
[6] Mrs. Mahajan, K. S. and Kulkarni, R. V. Application of Data mining Tools For
Stock Market.
[7] Antunes, C. M. and Oliveira, A. L. Temporal Data Mining: an overview: Institute
Superior Technical, Dep. Engenharia Informatics, Av. Rovisco Pais 1, 1049-001
Lisboa, Portugal, pp 2–15.
[8] Cheung, D., Han, J., Ng, V. and Wong, C. Y. Maintenance of Discovered
Association Rules in Large Databases: An Incremental Updating Technique. Proc
of 1996 Int’l Conf. on Data Engineering, February 1996, pp. 106–114.
[9] Soni, S. and Shibu, S. Advance Mining of Temporal High Utility Itemset. I. J.
Information Technology and Computer Science, 2012, 4, pp. 26–32.
[10] Luhr, S. and Venkatesh, S. An extended frequent pattern tree for inter-transaction
association rule mining. Technical Report, 2005.
[11] Tung, A. K. H., Lu, H., Han, J. and Feng, L. Breaking the barrier of transactions:
Mining inter-transaction association rules. Knowledge Discovery and Data
Mining, August 1999, pp 297–301.
[12] Dong, J. and Han, M. Ifcia: An efficient algorithm for mining inter-transaction
frequent closed itemsets. FSKD’07: In Proceedings of the 4th International
Conference on Fuzzy Systems and Knowledge Discovery, August 2007, pp. 678–
682.
[13] Argiddi, R. V. and Apte, S. S. An Evolutionary Fragment Mining Approach To
Extract Stock Market Behavior For Investment Portfolio International journal of
Computer Engineering & Technology, 4(5) 2013, pp. 138–146
[14] Dunham, Margaret H. Data mining: Introductory and advanced topics, 1 edn., Ch
6. Prentice Hall, September 1, 2002, ISBN: 0130888923
[15] www.slideshare.net/Tommy96/Temporal-data-mining-an-overview