International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Vol...
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Vol...
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Vol...
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Vol...
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Vol...
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Vol...
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Vol...
Upcoming SlideShare
Loading in...5
×

50120140503013

56

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
56
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

50120140503013

  1. 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 115-121 © IAEME 115 FREQUENT NEGATIVE SEQUENTIAL PATTERNS –A SURVEY Sujatha Kamepalli #1 , Dr. Raja Sekhara Rao Kurra*2 #1 Research Scholar, CSE Department, Krishna University Machilipatnam, Andhra Pradesh, India *2 Dean Administration, CSE Department, K.L. University Guntur, Andhra Pradesh, India ABSTRACT Data mining is the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. A frequent pattern is defined as a pattern, which can be a set of items, either with or without an order, occurs together in a database frequent enough to satisfy a certain minimum threshold. Sequential pattern mining is an important task in data mining. It provides an effective way to get special patterns from sequence data. Different from traditional positive sequential patterns, negative sequential patterns focus on negative relationship between items sets, in which case, absent items are taken into consideration. This paper provides the analysis of different algorithms used for negative sequential patterns. Keywords: Data Mining, Frequent Pattern, Sequential Pattern Mining, Sequence Data, Positive Sequential Patterns, Negative Sequential Patterns. I. INTRODUCTION Data mining is the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. The Apriori-based algorithms find frequent item sets based upon an iterative bottom-up approach to generate candidate item sets. Since the first proposal of association rules mining by R. Agrawal [3, 4], Nowadays, with the rapid development of information technology, especially the web service-based application, service-oriented architecture and cloud- computing, continually expanding data are integrated to generate useful information. Many techniques have been used for data mining. Association rules mining (ARM) is one of the most INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 5, Issue 3, March (2014), pp. 115-121 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2014): 8.5328 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  2. 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 115-121 © IAEME 116 useful techniques. The challenges associated with ARM, especially for parallel and distributed data mining, include minimizing I/O, increasing processing speed and reducing communication cost [5]. A major concern in ARM today is to continue to improve algorithm performance. A frequent pattern is defined as a pattern, which can be a set of items, either with or without an order, occurs together in a database frequent enough to satisfy a certain minimum threshold [2](Han, Cheng, Xin & Yan 2007). Frequent pattern mining is the key step to find interesting patterns from databases, such as association rule mining, sequential patterns mining, etc, and is vital in data mining tasks. 1.1Sequence and Sequence Dataset A sequence is an ordered list of elements like < e1 e2 e3 : : : en >, where ei is an element, and could be either one item or a set of items. The elements can be ordered by time, position or any other standard. Each element could also contain one or more items with no order between them. The length of a sequence is usually not fixed. Sequence data is an important type of data which is popular in much scientific, medical, business service, bioinformatics, and some other applications. An example of transactions data is shown in Table 1.1. Table 1.1: A Transactional Data Table In the data, customer 002, he/she has three transactions. If all of his/her transactions were ordered by the transaction time, they can be built into a sequence as < (30; 31; 32) 28 (22; 32) >. Another example comes from Bioinformatics. Following is a gene sequence which is ordered by position [1]. ACTGCTGCCAATC 1.2 Sequential pattern mining Sequential pattern mining is an important task in data mining. It provides an effective way to get special patterns from sequence data. Sequential pattern considers the order of item sets, but association rule doesn’t take that into account. For example, given a sequence, such as buying a desktop first, then an laptop, and then a router, if it occurs frequently in customers’ shopping history with this special order, it is a (frequent) sequential pattern. When a frequent pattern only contains item sets without any order, it becomes a classical association rule problem; for example, the same customer buys desktop, laptop and router without considering their orders. Finding sequential pattern has been widely recognized as a hot area in data mining and machine learning. It has been proven to be very useful or even essential while handling critical business problems, such as customer behavior analysis, event detection and bioinformatics. For example, it is widely employed in DNA, protein, and medicine identification, where it helps scientists to find out identical and different structures and functions of molecular or DNA sequences [1].
  3. 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 115-121 © IAEME 117 1.3Positive sequential patterns and negative sequential patterns Different from traditional positive sequential patterns, negative sequential patterns focus on negative relationship between items sets, in which case, absent items are taken into consideration. We give a simple example to illustrate the differences: Suppose p1=<a b c d f> is a positive sequential pattern; p2=<a b ¬c e f> is a negative sequential pattern; and each item a, b, c, etc, stands for a medical item code in the customer claim database of a private health care insurance company. By getting pattern p1, we can tell that an insurant usually claimed for a, b, c, d and f in a row; but with pattern p2, we are also able to find that given an insurant claim for medical items a and b, and the customer does NOT claim c, he/she would claim item e instead of d later. A number of methods have been proposed to discover sequential patterns. Most of conventional methods for sequential pattern mining were developed to discover positive sequential patterns from database [6, 7, 8, 9, 10, and 11]. Positive sequential patterns mining consider only the occurrences of item sets in sequences. In practice, however, the absences of item sets in sequences may imply valuable information. For example, web pages A, B, C, and D are accessed frequently by users, but D is seldom accessed after the sequence A, B and C. The web page access sequence can be denoted as < A, B, C ¬D >, and called a negative sequence. Such sequence could give us some valuable information to improve the company’s website structure. For example, a new link between C and D could improve users’ convenience to access web page D from C [12]. 1.4 Applications of sequential pattern mining • Customer shopping sequences: First buy computer, then CD-ROM, and then digital camera, within 3 months. • Medical treatments, natural disasters (e.g., earthquakes), science & eng. processes, stocks and markets, etc. • Telephone calling patterns, Weblog click streams. • DNA sequences and gene structures [16]. II. SURVEY ON NEGATIVE SEQUENTIAL PATTERNS 1.Nancy P. Lin, Hung-Jen Chen, Wei-Hua Hao, Hao-En chueh, Chung-I Chang in” Mining Strong Positive and Negative Sequential Patterns” proposed a method for mining strong positive and negative sequential patterns, called PNSPM. In this method, absences of item sets in sequences are also considered [12]. 2. K.M.V.Madan Kumar, P.V.S.Srinivas and C.Raghavendra Rao in” Sequential Pattern Mining With Multiple Minimum Supports in Progressive Databases” proposed a new approach which can be applied on any algorithm independent of that whether the particular algorithm may or may not use the process of generating the candidate sets for identifying the frequent item sets. The proposed algorithm will use the concept of “percentage of participation” instead of occurrence frequency for every possible combination of items or item sets. The concept of percentage of participation will be calculated based on the minimum support threshold for each item set [13]. 3. Zhigang Zheng Yanchang Zhao Ziye Zuo Longbing Cao in” Negative-GSP: An Efficient Method for Mining Negative Sequential Patterns” proposes a new method for mining negative sequential
  4. 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 115-121 © IAEME 118 patterns, called Negative-GSP. Negative-GSP can find negative sequential patterns effectively and efficiently [14]. They also designed effective pruning method to reduce the number of candidates. 4. Nancy P. Lin, Wei-Hua Hao, Hung-Jen Chen, Chung-I Chang, Hao-En Chuehin in “An Algorithm for Mining Strong Negative Fuzzy Sequential Patterns” proposed a method for mining negative fuzzy sequential patterns, called NFSPM. In this method, the absences of fuzzy item sets are also considered. Besides, only sequences with high degree of interestingness can be selected as negative fuzzy sequential patterns [15]. 5. Xiangjun Dong Zhigang Zheng,Longbing Cao Yanchang Zhao in” e-NSP: Efficient Negative Sequential Pattern Mining Based on Identified Positive Patterns Without Database Rescanning” propose an efficient algorithm for mining NSP, called e-NSP, which mines for NSP by only involving the identified PSP, without re-scanning databases. First, negative containment is defined to determine whether or not a data sequence contains a negative sequence. Second, an efficient approach is proposed to convert the negative containment problem to a positive containment problem. The supports of NSC are then calculated based only on the corresponding PSP. Finally, a simple but efficient approach is proposed to generate NSC [17]. 6. Vedant Rastogi Vinay Kumar Khare in” Apriori Based: Mining Positive and Negative Frequent Sequential Patterns “proposed an algorithm for mining exception rules [18]. 7. Yanchang Zhao, Huaifeng Zhang, Longbing Cao,Chengqi Zhang, and Hans Bohlscheid in” Efficient Mining of Event-Oriented Negative Sequential Rules” This paper analyzes three types of negative sequential rules and presents a new technique to find event-oriented negative sequential rules[19]. 8. Zhigang Zheng, Yanchang Zhao, Ziye Zuo, and Longbing Cao in” An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns” This paper proposes a Genetic Algorithm (GA) based algorithm to find negative sequential patterns with novel crossover and mutation operations, which are efficient at passing good genes on to next generations without generating candidates. An effective dynamic fitness function and a pruning method are also provided to improve performance [20]. 9. Vinay Kumar Khare,Vedant Rastogi in” Mining Positive and Negative Sequential Pattern in Incremental Transaction Databases” In this approach we can easily update existing transaction database with the appended transaction database. The Merged transaction database (updated database) will be mined to get the Positive & Negative Sequential patterns. Merging of Existing and Appended database is performed by using the updated compact pattern tree approach. Proposed model is Mining Positive and Negative Sequential patterns in incremental transaction Databases. To mine Positive and Negative Sequential patterns in incremental transaction database in this Approach we can update, existing transaction database with appended transaction database by the use of Updated Compact pattern tree approach then according to their support the new updated transaction database table is maintained and we can mine positive and negative sequential patterns with the help of CPNFSP algorithms proposed by Weimin Quyang and Qinhua Huang [21][22]. 10. Y. Li Y, A. Algarni, and N. Zhong in” Mining Positive and Negative Patterns for Relevance Feature Discovery” Proposed An innovative approach to evaluate weights of terms according to both their specificity and their distributions in the higher level features, where the higher level features include both positive and negative patterns[23].
  5. 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 115-121 © IAEME 119 S.NO. TITLE OF WORK AUTHORS PROPOSED WORK YEAR 1 Mining Strong Positive and Negative Sequential Patterns Nancy P. Lin, Wei- Hua Hao, Hung-Jen Chen, Chung-I Chang, Hao- En Chueh Proposed a method for mining strong positive and negative sequential patterns, called PNSPM. 2008 2 Sequential Pattern Mining With Multiple Minimum Supports in Progressive Databases K.M.V.Madan Kumar, P.V.S.Srinivas and C.Raghavendra Rao Proposed a new approach which can be applied on any algorithm independent of that whether the particular algorithm may or may not use the process of generating the candidate sets for identifying the frequent item sets 2012 3 Negative-GSP: An Efficient Method for Mining Negative Sequential Patterns Zhigang Zheng Yanchang Zhao Ziye Zuo Longbing Cao Proposes a new method for mining negative sequential patterns, called Negative-GSP. Negative-GSP can find negative sequential patterns effectively and efficiently. 2009 4 An Algorithm for Mining Strong Negative Fuzzy Sequential Patterns Nancy P. Lin, Wei- Hua Hao, Hung-Jen Chen, Chung-I Chang, Hao- En Chueh Proposed a method for mining negative fuzzy sequential patterns, called NFSPM. 2007 5 e-NSP: Efficient Negative Sequential Pattern Mining Based on Identified Positive Patterns Without Database Rescanning Xiangjun Dong Zhigang Zheng,Longbing Cao Yanchang Zhao Proposed an efficient algorithm for mining NSP, called e-NSP, which mines for NSP by only involving the identified PSP, without re-scanning databases. 2011 6 Apriori Based: Mining Positive and Negative Frequent Sequential Patterns Vedant Rastogi Vinay Kumar Khare Algorithm for mining exception rules. 2012 7 Efficient Mining of Event-Oriented Negative Sequential Rules Yanchang Zhao, Huaifeng Zhang, Longbing Cao, Chengqi Zhang, and Hans Bohlscheid This paper analyzes three types of negative sequential rules and presents a new technique to find event-oriented negative sequential rules. ----- 8 An Efficient GA- Based Algorithm for Mining Negative Sequential Patterns Zhigang Zheng, Yanchang Zhao, Ziye Zuo, and Longbing Cao This paper proposes a Genetic Algorithm (GA) based algorithm to find negative sequential patterns with novel crossover and mutation operations, which are efficient at passing good genes on to next generations without generating candidates. 2010 9 Mining Positive and Negative Sequential Pattern in Incremental Transaction Databases Vinay Kumar Khare,Vedant Rastogi In this approach we can easily update existing transaction database with the appended transaction database. The Merged transaction database (updated database) will be mined to get the Positive & Negative Sequential patterns. Merging of Existing and Appended database is performed by using the updated compact pattern tree approach. 2013 10 Mining Positive and Negative Patterns for Relevance Feature Discovery Y. Li Y, A. Algarni, and N. Zhong Proposed An innovative approach to evaluate weights of terms according to both their specificity and their distributions in the higher level features where the higher level features include both positive and negative patterns. 2010
  6. 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 115-121 © IAEME 120 III. CONCLUSION This paper provides the definitions for sequential data, sequential pattern mining, positive sequential patterns and negative sequential patterns. It also explains about the importance of negative sequential patterns and also the applications sequential patterns. This paper acts as a base for the researchers who want to do research on negative sequential patterns. REFERENCES [1]. Zhigang Zheng, “Negative Sequential Pattern Mining”, A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy, January 2012. [2]. Han, J., Cheng, H., Xin, D. & Yan, X. (2007), ‘frequent pattern mining: current status and future directions’, Data Mining and Knowledge Discovery 15, 55–86. [3]. http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm. [4]. http://searchbusinessanalytics.techtarget.com/definition/association-rules-in-data.htm. [5]. Agrawal R,Srikant R. “Mining sequential patterns”, In the Proc.1995 IntConf. On Data Engineering, Taibei, Taiwan, March1995. [6]. R. Agrawal and R. Srikant, Mining Sequential Patterns, Proceedings of the Elventh International Conference on Data Engineering, Taipei, Taiwan, March, 1995, pp. 3-14. [7]. M. J. Zaki, Efficient Enumeration of Frequent Sequences, Proceedings of the Seventh CIKM, 1998. [8]. J. Ayres, J. E. Gehrke, T. Yiu, and J. Flannick, Sequential Pattern Mining Using Bitmaps, Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Alberta,Canada, July 2002. [9]. X. Yan, J. Han, and R. Afshar, CloSpan: Mining Closed Sequential Patterns in Large Datasets, Proceedings of 2003 SIAM International Conference Data Mining (SDM’03), 2003, pp. 166-177. [10]. M. Zaki, SPADE: An Efficient Algorithm for Mining Frequent sequences, Machine Learning, vol. 40, 2001, pp. 31-60. [11]. M. Zaki, Efficient Enumeration of Frequent Sequences, Proceedings of the Seventh International Conference Information and Knowledge Management (CIKM’98), 1998, pp. 68-75. [12]. NANCY P. LIN, HUNG-JEN CHEN, WEI-HUA HAO, HAO-EN CHUEH, CHUNG-I CHANG, “Mining Strong Positive and Negative Sequential Patterns” WSEAS TRANSACTIONS on COMPUTERS, Issue 3, Volume 7, March 2008, ISSN: 1109-2750. [13]. K.M.V.Madan Kumar1, P.V.S.Srinivas2 and C.Raghavendra Rao3, “Sequential Pattern Mining With Multiple Minimum Supports in Progressive Databases” International Journal of Database Management Systems ( IJDMS ) Vol.4, No.4, August 2012. [14]. Zhigang Zheng Yanchang Zhao Ziye Zuo Longbing Cao, “Negative-GSP: An Efficient Method for Mining Negative Sequential Patterns”, Australian Computer Society, Inc. (AusDM 2009), Melbourne, Australia. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 101, [15]. Nancy P. Lin, Wei-Hua Hao, Hung-Jen Chen, Chung-I Chang, Hao-En Chueh, “An Algorithm for Mining Strong Negative Fuzzy Sequential Patterns”, INTERNATIONAL JOURNAL OF COMPUTERS Issue 3, Volume 1, 2007. [16]. Sequential Pattern Mining ppt. www.is.informatik.uni-duisburg.de/.../im.../MiningSequentialPatterns.ppt. [17]. Xiangjun Dong Zhigang Zheng,Longbing Cao Yanchang Zhao, “e-NSP: Efficient Negative Sequential Pattern Mining Based on Identified Positive Patterns Without Database Rescanning “CIKM’11, October 24–28, 2011, Glasgow, Scotland, UK. Copyright 2011 ACM 978-1-4503- 0717-8/11/10.
  7. 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 115-121 © IAEME 121 [18]. Vedant Rastogi Vinay Kumar Khare, “Apriori Based: Mining Positive and Negative Frequent Sequential Patterns” International Journal of Latest Trends in Engineering and Technology (IJLTET), Vol. 1 Issue 3 September 2012, ISSN: 2278-621X. [19]. Yanchang Zhao, Huaifeng Zhang, Longbing Cao, Chengqi Zhang, and Hans Bohlscheid, “Efficient Mining of Event-Oriented Negative Sequential Rules”. [20]. Zhigang Zheng, Yanchang Zhao, Ziye Zuo, and Longbing Cao, “An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns” M.J. Zaki et al. (Eds.): PAKDD 2010, Part I, LNAI 6118, pp. 262–273, 2010. _c Springer-Verlag Berlin Heidelberg 2010. [21]. Vinay Kumar Khare,Vedant Rastogi, “Mining Positive and Negative Sequential Pattern in Incremental Transaction Databases” International Journal of Computer Applications (0975 – 8887) Volume 71– No.1, June 2013. [22]. Weimin Ouyang, Qinhua Huang, “Mining Positive and Negative Sequential Patterns with Multiple Minimum Supports in Large Transaction Databases”, IEEE Second WRI Global Congress on Intelligent Systems 2010. [23]. Y, A. Algarni, and N. Zhong, “Mining Positive and Negative Patterns for Relevance Feature Discovery” 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC (KDD 2010), pp. 753-762. [24]. A. K. Payra and S. Saha, “Generic Approach of Pattern Matching of Amino Acid Sequences using Matching Policy & Pattern Policy”, International Journal of Computer Engineering & Technology (IJCET), Volume 5, Issue 2, 2014, pp. 130 - 139, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. AUTHOR’S DETAIL K. Sujatha is pursuing her Ph.D. in Krishna University, Machilipatnam, A.P. She is interested doing research in data mining. She has three international journal publications in data mining. She has two national journal publications. She has a total of 10 years experience in teaching. She is working as associate professor in CSE Department, Malineni Lakshmaiah Engineering College, Singaraya konda, Prakasam District. A.P. Prof. K. Rajasekhara Rao is a Professor of Computer Science & Engineering at K.L.University and presently holding several key positions in K.L.University, as Dean (Administration) & Principal, K L College of Engineering (Autonomous). Having more than 26 years of teaching and research experience, Prof. Rao is actively engaged in the research related to Embedded Systems, Software Engineering and Knowledge Management. He had obtained Ph.D in Computer Science & Engineering from Acharya Nagarjuna University (ANU), Guntur, Andhra Pradesh and produced 58 publications in various International/National Journals and Conferences. Prof.KRR was awarded with “Patron Award” for his outstanding contribution, by India’s prestigious professional society Computer Society of India (CSI) for the year 2011 in Ahmedabad. He has been adjudged as best teacher and has been honored with “Best Teacher Award”, seven times. Dr. Rajasekhar is a Fellow of IETE, Life Member’s of IE, ISTE, ISCA & CSI (Computer Society of India). Dr.Rajasekhar is nominated as sectional committee member for Engineering Sciences of 100th Annual Convention of Indian Science Congress Association. He has been the past Chairman of the Koneru Chapter of CSI.

×