Your SlideShare is downloading. ×
Subgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructur
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Subgraph relative frequency approach for extracting interesting substructur

120

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
120
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 400 SUBGRAPH RELATIVE FREQUENCY APPROACH FOR EXTRACTING INTERESTING SUBSTRUCTURES FROM MOLECULAR DATA Mr. M.A.Srinuvasu1 , Dr. P. Padmaja2 , Mr. Y. Dharmateja3 Department of CSE & IT, GITAM University, Visakhapatnam-530045, INDIA ABSTRACT The classification of unseen molecule in molecular data is done by taking the substructures of the molecule. The mining of interesting substructures in molecular data for classification contain subgraphs that are characterized by different classes. In this paper, authors suggest a Subgraph Relative Frequency (SRF) method that screens each frequency subgraph to determine whether the substructure that occurs frequently is an interesting one or not. SRF thus discovers interesting subgraphs for each of these classes which are calculated using relative frequencies. To classify an unknown molecule, SRF first finds the subgraph of the molecule and calculates the interestingness of the sub-graph for each class, based on the weight. The performance of SRF is compared against MISMOC and is found to be just as accurate as MISMOC. MISMOC approach requires probability calculations to find the absolute frequency, thus the complexity is increased. The proposed method decreases the above complexity by just calculating the relative frequency to determine the interestingness. The method was experimented on a small predefined molecular data and the analyses of the result were done. Thus the performance of the proposed SRF approach was found satisfactory and efficient. Keywords: Frequent subgraph, graph mining, interestingness, molecular structure classification, SRF, MISMOC. I. INTRODUCTION Data mining tasks help in discovering non-trivial patterns that are difficult to find manually. Data mining has recently attracted considerable attention from database practitioners and researchers because of its applicability in many areas such as decision support, market strategy and financial forecasting. Database technology has been used with great success in traditional data processing. But with the ability to store enormous amounts of business data, it is important to find a way to mine that directly from the database and extract nuggets to leverage for business advantage. If the data can be mined directly, it can be used to find abstractions or relations that improve the understanding of the INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), pp. 400-411 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 401 data and help in making business decisions. Transactional mining (association rules, decision trees etc) can be effectively used to find non-trivial patterns in categorical and unstructured data. For applications that have an inherent structure (e.g. chemical compounds, proteins) graph mining is appropriate, because mapping the structure data into other representations would lead to loss of structure. Various kinds of data such as social network data, Protein and other Bioinformatics data can be effectively represented as graphs [1]. A graph representation provides a natural way to express relationships within data. Graph based data mining expresses data in the form of graphs, and focuses on the discovery of interesting sub-graph patterns [2][3]. Graphs are being increasingly used to models wide range of scientific data. Such widespread usage of graphs has generated considerable interest in mining patterns from graph databases. Graph mining is appropriate as compared to other techniques as mapping them into other representations would lose the inherent structure. Graph mining uses the natural structure of the application domain and mines over that structure. Graph mining consists of algorithms like SUBDUE [4] (holder et al.KDD’94 is for the incomplete beam search, WARMR [5] (Dehaspe et al KDD’98) is used for inductive logical programming. Graph theory –based approaches are classified into two apriori-based approach [6] and pattern- growth approach [7][8]. Graph mining methods for mining the frequent subgraphs. Frequent subgraph mining approaches under the apriori-based approach. FSG [6], FFSM (Fast Frequent Subgraph Mining) and pattern growth approach are gSpan [9], MoFa [12], Gaston [12] are to follow some search orders like DFS and BFS. To elimination of duplicates subgraphs it use passive vs active. For discover order of patterns it like to path, tree, or graph. Graph mining they have classification and clustering. Graph clustering is finding similarity measures in two ways first is feature-based similarity and structure-based similarity. Graph classification having four types of approaches [10], first is local structure based approach, second is graph pattern-based approach, next kernel-based approach and boosting. MISMOC [13] is the method for discovering the interesting molecule substructures for classification. RE_MISMOC [14] is a method of improving the MISMOC by the relative frequency. The format of representing the molecule structures in SMILES notation [15]. Areas of applications are Drug discovery, Protein Folding, Comparative Genomics, Cancer Risk Assessment, Gene evolution. The size and number of molecular structure databases have been grown rapidly due to the advances in X-ray diffraction or nuclear magnetic resonance (NMR) technologies. Molecular databases of nucleotide, genome, protein and nucleic acid, etc, the databases continue to grow in size and diversity, and there is an increasing need for techniques to be developed to mine these data for interesting patterns. II. LITERATUR STUDY L. B. Holder, et al [4] present a method for Substructure discovery in the SUBDUE system. This algorithm begins with the substructure matching a single vertex in the graph, selects the best substructures and expands the instances of these substructures by one neighbouring edge in all possible ways. It retains the best substructures in a list; the total amount of computation exceeds a given limit. The evaluation of each substructure is guided by the minimum description length principle and background knowledge rules provided by the user. M. Kuramochi et al [6] suggested data mining techniques that are being increasingly applied to non-traditional domains, existing approaches for finding frequent item sets cannot be used as they cannot model the requirement of these domains. An alternate way of modelling the objects in these data sets is to use a graph to model the database objects. This paper describes a computationally efficient algorithm for finding all frequent sub graphs in large graph databases. We evaluated the performance of the algorithm by experiments with synthetic datasets as well as a chemical compound dataset. The empirical results show that our algorithm scales linearly with the number of input transactions and it is able to discover frequent sub graphs from a set of graph transactions reasonably
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 402 fast, even though we have to deal with computationally hard problems such as canonical labelling of graphs and sub graph isomorphism which are not necessary for traditional frequent item set discovery. Chan, et al [7] describes an inductive method that is capable of detecting the inherent patterns in such a sequence and to make predictions about the attributes of future events. Unlike previous AI- based prediction methods, the proposed method is particularly effective in discovering knowledge in ordered event sequences even if noisy data are being dealt with. The method can be divided into three phases: (i) detection of underlying patterns in an ordered event sequence; (ii) construction of sequence-generation rules based on the detected patterns; and (iii) use of these rules to predict the attributes of future events. K. C. C. Chan et al [8] gave a method for the efficient acquisition of classification rules from training instances which may contain inconsistent, incorrect, or missing information. This algorithm consists of three phases: (i) the detection of inherent patterns in a set of noisy training data; (iii) the construction of classification rules based on these patterns; and (iii) the use of these rules to predict the class membership of an object. Being able to handle uncertainty in the learning process, the proposed algorithm can be employed for applications in real-world problem domains involving noisy data. X. Yan et al [9] discovered a method for frequent graph-based pattern mining in graph datasets and gSpan (Graph-based substructure patter mining) which is the first algorithm that explores depth-first search (DFS) in frequent subgraph mining. This algorithm consists of two phases (i) DFS Lexicographic order (ii) minimum DFS code which forms a novel canonical labelling system to support DFS search. gSpan discovers all the frequent subgraphs without candidate generation and false positive pruning. Winnie W. M. Lam et al [13] describes a novel technique called mining interesting substructures in molecular data for classification (MISMOC) that can discover interesting frequent sub graphs not just for the characterization of a molecular class but also for the distinguishing of it from the others. Using a test statistic, MISMOC screens each frequent sub graph to determine if they are interesting. For those that are interesting, their degrees of interestingness are determined using an information-theoretic measure. When classifying an unseen molecule, its structure is then matched against the interesting sub graphs in each class and a total interestingness measure for the unseen molecule to be classified into a particular class is determined, which is based on the interestingness of each matched sub graphs. Maryam Kohzadi, et al [14] propose a novel technique called RF_MISMOC (Relative Frequency MISMOC) for computing interestingness of patterns in each class .The performance of the base algorithm by selecting equal numbers of interesting indicator patterns of classes and also determining optimum threshold value for selection of indicator patterns. This is an improvement over the original MISMOC algorithm. III. RELATED WORK 1. MISMOC algorithm [13] is used in chemical molecular classification by considering the graph structure for them. MISMOC performs its tasks for searching the frequent subgraphs using an existing algorithm like FSG [6] or GSpan [7]. Here are some probabilistic formulas for calculating Discovering interesting frequent subgraphs:
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 403 Interesting Measures as a Function of the Weight of Evidence: Based on the mutual information measures, the weight of evidence is Classification Using a total interestingness Measures These are the probabilistic calculations in the MISMOC Algorithm. 2. RF_MISMOC [14], Relative Frequency MISMOC is extracted from the technique of mining interesting substructures in molecular data for classification. This algorithm motive is to improvement of MISMOC graph based classification by using relative frequency of the interesting patterns instead of absolute frequency, numIntrsPatterns is the number of all interesting patterns in one class and numClass is the number of classes in the problem and the F(x). Determining interesting patterns as follows- First of all convert training data to sequence of one and zero. Second apply IODLG algorithm to find the patterns with more frequently than minsup, next compute value of d parameter for all frequency pattern, next select the threshold between 1-2 and select frequently patterns that have a value of d more than the threshold. Next determine minimum number of interesting patterns that selected in classes. And sort frequent patterns based on value of d, and select the first minIntrs patterns from each class. Illustrative example: To explain the discovery of frequent subgraphs may, we are given three classes of artificial molecular data shown in Fig. 1. Each of these three classes of data contains eight molecules represented in SMILES notation [15] and each molecule consists of atoms connected with bonds. These molecules can be represented as labelled molecular graph with each node used to represent an atom and each edge as a bond. Given the set of graph data as shown in Fig.1, frequent subgraphs can be discovered in each class1, 2 and 3 using a graph-mining algorithm, such as FSG [6]. In FSG algorithm molecular structures are converted to subgraph structures then finding the interesting relative frequency and the unseen molecules are predicted. The unseen molecular structures are in Fig. 2(a), 2(b).
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 404 Fig.1. Training molecular data Let us consider an unseen molecule like C[I+](O)O[Mo](N=O)([U]1[U][U][U][U][ U]1)([No]1[No][No]1)C(=O)C(F)(F)F Fig. 2(a) C[Pt](C)C([Co]SC#N)[Co++](N=[N+]=[N H2+])([Pu](N)[Pu]=O)[Y](N)=O Fig. 2(b) Fig.2. Unseen molecule
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 405 III. METHODOLOGY The (subgraph relative frequency) SRF methodology discovers the unknown molecule classification. The SRF describes first consider the training molecule data, from that finding the subgraphs by using FSG [6] algorithm. Based on the FSG it obtains the relative frequency. By considering relative frequency values evaluate the interestingness frequency subgraphs by threshold. Consider one unknown molecule not in training molecule data, classifying that unknown molecule belongs to particular class. The block diagram is shown in Fig 1. Fig.3. SRF Block Diagram Subgraph Relative Frequency (SRF) The unseen molecular classification problem, which this paper organization is to be stated as follows. Let us consider the predefined molecule structure data G (Fig 2.traing molecule data set), containing n molecules are pre-classified into p classes, the unseen molecular problem concerned with the discovering of interesting patterns in the data to “unseen” molecule not in G to be correctly classified into one of the p class. The n molecules of graph G can be represented as G1, G2,..., Gn , where Gi = Gi (Vi, Ei), i {1,....,n},the vertices representing as atom and edge represented as bonds between atoms. The p classes that the n molecules are their corresponding molecules structures are classified, which are represented as C(1) ,....C(p) , where C(i) ={G1 (i) ,....., Gci (i) }‫ك‬G,i=1,....,p. In the following we present the details of SRF technique is effectively improving the accuracy of molecular graph classification. This SRF performance the several various tasks. It first searches for the frequent subgraphs by using existing algorithm FSG[6]. Next calculate the relative frequency value and interesting subgraph frequency and at last unseen molecular classification. A. Discovering frequent subgraph molecule To discover frequent subgraph in a molecular data base, there are several graph mining algorithms to choose.SRF using the FSG graph mining algorithm. Given molecular data set G={G1,........, Gn}by the algorithm to discover a set of frequent sub graphs F(1) ,...., F(p) , where F(1) ={ F1 (i) ,......, Fni (i) }, i=1,....,p, for each of the corresponding p classes C(i) ,...,C(p) . The FSG algorithm can find all the frequent sub graphs in each class of molecular graphs using the apriori algorithm. Briefly, FSG [6] described as follows. For each C(i) in G, i =1,....p. FSG first finds a set of frequent subgraphs of one-edge and two-edge. Based on these two intermediate subgraphs, it starts to iterate general frequent candidate subgraph. FSG counts the general frequency candidates and prune subgraphs that do not satisfies the threshold and verifying the same support condition to prune the lattice of frequent subgraphs. Finally the frequent subgraphs F(1) ,F(2) ,......,F(p) .where F(i) contains all the k-frequent subgraphs ,are generated by each class. Let gk be a k-subgraph with k-edges, Dk be the set of all candidate subgraphs with k-edges, Fk(i) be the set of frequent k- subgraph for class C(i) , the algorithm of FGS is summarized in Fig 4. Training molecule data Finding sub-graphs by FSG Obtains Relative frequency from FSG table SRF condition gets the interesting frequency sub graphs Classify the unknown molecule
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 406 Fig. 4. Algorithm of FSG B. Discovering interesting frequent subgraphs by SRF The aim of FSG [6] to discover the frequent subgraph F(i) ={F1 (i) ,......., Fni (i) }, i=1,..., p in each of the corresponding graph class C(1) ,....., C(p) . A frequent subgraph, which appears frequently in one graph, may also do so in another and such frequent subgraphs are not interesting for classification. In this paper, we presenting a methodology that is SRF used to identify the interesting subgraphs that are interesting and useful for classification. This methodology is based on the relative frequency values on use of simple arithmetic calculation and the algorithm given in Fig. 5.The calculation of discovering the relative frequency of F1 (i) for C(1) is F1 (i) ( C(i) ) - (least frequency of F1 (i) in all of the classes) repeat the iteration for rest of the relative frequency values of a graph molecule in a classes C(i) . The interesting frequency subgraph are shown in table I and found by using the D, the highest relative frequency value of a subgraph F1 (i) for a class C(i) subtracted from the second highest relative frequency value of a subgraph F1 (i) for a class C(i) . If D ≥ 30% of class size based on this condition the interesting relative subgraphs are classified. The set of interesting frequent subgraph discovered from each of C(1) ,....., C(p) is denoted as F′(i) = {F1 ′(i) ,...., Fn ′(i) },i=1,...,p respectively.
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 407 Fig.5. Algorithm of SRF C. Classification of unseen molecule of a class. Given the interesting frequent subgraphs F1 ′(1) ,...., Fn ′(p) , discovering for each corresponding p classes C(1) ,...., C(p) ,an “unseen” molecule graph is not in graph data G, classified by matching it against the subgraphs in each of F′(i) ,i=1,...,p. SRF computes the total interestingness defines as summation of the total interesting frequent subgraphs F1 ′(i) for against G to classified into C(i) as follows: I(i) (G) = I(G‫א‬ C(i) /G‫ב‬ C(i) |G is characterized by F1 ′(i) ,...., Fn ′(i) ) ൌ ෌ IሺG ‫א‬ Cሺiሻ/G ‫ב‬ Cሺiሻ|G ௡௜ ௝ୀ଴ is characterized by Fj ′(i) ) The total interestingness for G classified into each of C(1) ,...,C(p) is determined and SRF assign G to the class, which gives the greatest total interesting measure.
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 408 Table I: Interestingness of FSG Molecule and Smiles Notation Class 1 Class 2 Class 3 S1 (i) N=O 7 0 8 1 7 0 S2 (i) C[Pt](C)C 2 0 5 3 6 4 S3 (i) N[Pu+][Pu]=O 4 2 5 3 3 0 S4 (i) ClC(Cl)Cl 5 4 1 0 1 0 S5 (i) OS(O)(=O)=O 5 5 0 0 1 1 S6 (i) OP(O)(O)=O 1 0 5 4 1 0 S7 (i) N=[N+]=[N-] 1 0 3 2 1 0 S8 (i) F[C](F)(F)=O 0 0 1 1 4 4 S9 (i) BBB=BBB(BOB=O)B=BB=B/B 1 0 1 0 4 3 S10 (i) [U]1[U][U][U][U][U]1 1 0 4 3 4 3 S11 (i) C[I](O)O 5 3 2 0 3 1 S12 (i) N[Y]=O 3 2 2 1 1 0 S13 (i) [No]1[No][No]1 4 0 4 0 4 0 S14 (i) [Ir]1[Ir][Ir][Ir]1 4 0 5 1 4 0 S15 (i) N#[S] 3 1 2 0 4 2
  • 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 409 Comparison of the proposed (SRF) with MISMOC and RF_MISMOC In this paper SRF address that discovering the interestingness of FSG [6] and “unseen” molecule classification, by using the relative frequency values. This is to be solved by using the simple arithmetic calculation and reduces the time complexity form the MISMOC. In MISMOC[13] is done the same thing discovering the interestingness measure of FSG [6] and “unseen “molecule classification by using the absolute frequency value and doing the probability calculation it shows that more complexity. While comparing these to methodologies of the result shows the same. RE_MISMOC [14] is shown the improved performance of MISMOC graph classification algorithm by taking relative frequency of pattern in each class, by selecting equal number of interesting indicator pattern class and determined optimal threshold value for the selection of indicator performance. In this paper, considering the relative frequency and discovering the “unseen” molecule classification for a class which is not done in RF_MISMOC. Thus the performance of SRF approach is to discovering the interesting relative frequency and “Unseen” molecule classification is efficient. IV. EXPERIMENTAL RESULTS The interestingness of subgraphs using SRF and MISMOC for the above example is calculated as follows. Consider an unknown molecule Mo N O O No No No U U U U U U O F F F HO I+ H3C C[I+](O)O[Mo](N=O)([U]1[U][U][U][U][U]1) ([No]1[No][No]1)C(=O)C(F)(F)F Fig. 2(a) These are S1, S8, S10, S11, S13 sub graphs from the table I The calculation using MISMOC is Class 1 interestingness = Sum of d values for the class 1 of the subgraphs S1, S8, S10, S11, S13 found in unknown molecule= −11.326.Class 2 interestingness = Sum of d values for the class 1 of the subgraphs S1, S8, S10, S11, S13 found in unknown molecule = -9.104. Class 3 interestingness = Sum of d values for the class 1 of the subgraphs S1, S8, S10, S11, S13 found in unknown molecule=2.390. Based on the interestingness values we can classify the unknown molecule. Classification = Class 3. The calculation using SRF is Class 1 interestingness =2+0+0+0+0=2(2 for S1 relative value in class 1, 0 for S8 relative value in class 1, 0 for S10 relative value in class 1, 0 for S11 relative value in class 1, 0 for S13 relative value in class 1).Class 2 interestingness = 1+1+1+3+0=6. Class 3 interestingness=3+0+4+3+0=10 Based on the interestingness values classify the unknown molecule. Classification = Class 3.
  • 11. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 410 Consider an unknown molecule C[Pt](C)C([Co]SC#N)[Co++](N=[N+] =[NH2+])([Pu](N)[Pu]=O)[Y](N)=O Fig. 2(b) These are S2, S3, S10, S7, S12, S15 sub graphs from the Table I. The calculation using MISMOC is Class 1 interestingness = Sum of d values for the class 1 of the subgraphs S2, S3, S10, S7, S12, S15 found in unknown molecule = −8.034.Class 2 interestingness = Sum of d values for the class 2 of the S2, S3, S10, S7, S12, S15 found in unknown molecule = -0.719. Class 3 interestingness = Sum of d values for the class 3 of the subgraphs S2, S3, S10, S7, S12, S15 found in unknown molecule = -16.704. Based on the interestingness values classify the unknown molecule. Classification = Class 2. The calculation using SRF is Class 1 interestingness = Sum of relative values for the class 1 of the subgraphs S2, S3, S10, S7, S12, S15 found in unknown molecule=4. Class 2 interestingness = Sum of relative values for the class 2 of the subgraphs S2, S3, S10, S7, S12, S15 found in unknown molecule=8. Class 3 interestingness = Sum of relative values for the class 3 of the subgraphs S2, S3, S10, S7, S12, S15 found in unknown molecule=6. Based on the interestingness values classify the unknown molecule. Classification = Class 2. Similarly perform for classification of class 1 also. V. CONCLUSION In this paper, we introduced a new graph – mining technique called SRF (subgraph relative frequency) to discover the unknown molecule subgraphs from graph databases. In SRF, instead of the absolute frequency subgraph considering the relative frequency value.SRF methodology is best and effective way to discover the interesting frequent subgraphs for a class and determine the unseen molecules substructures classification. This algorithm gives less time complexity from previous methods. VI. REFERENCES [1] J. A. Bondy, Graph Theory With Applications. New York: Elsevier, 1976. [2] D. Conklin, S. Fortier, and J. Glasgow, “Knowledge discovery in molecular databases,” IEEE Trans. Knowl. Data Eng., vol. 5, no. 6, pp. 985–987, Dec. 1993. [3] Y. Yoshida, Y. Ohta, K. Kobayashi, and N. Yugami, “Mining interesting patterns using estimated frequencies from subpatterns and superpatterns,” Lecture Notes in Computer Science, vol. 2843, pp. 494–501, 2003.
  • 12. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 411 [4] L. B. Holder, D. J. Cook, and S. Djoko, “Substructure discovery in the SUBDUE system,” in Proc. AAAI Workshop Knowl. Discov. Databases, 1994, pp. 169–180. [5] R. D. King, A. Srinivasan, and L. Dehaspe,“Warmr:A data mining tool for chemical data,” J. Comput.-Aided Mol. Des., vol. 15, no. 2, pp. 173–181, 2001. [6] M. Kuramochi and G. Karypis, “Frequent sub-graph discovery,” in Proc. 1st IEEE Int. Conf. Data Mining (ICDM), 2001, pp. 313–320. [7] K. C. C. Chan, A. K. C. Wong, and D. K. Y. Chiu, “Learning sequential patterns for probabilistic inductive prediction,” IEEE Trans. Syst., Man Cybern., vol. 24, no. 10, pp. 1532–1547, Oct. 1994. [8] K. C. C. Chan and A. K. C. Wong, “APACS: A system for automated pattern analysis and classification,” Comput. Intell.: Int. J., vol. 6, pp. 119– 131, 1990. [9] X. Yan and J. Han, “gSpan: Graph-based substructure pattern mining,” in Proc. IEEE Int. Conf. Data Mining, 2002, pp. 721–724. [10] I.Fischer and T. Meinl, “Graph-based molecular data mining – An overview,” in Proc. IEEE Int. Conf. Syst., Man Cybern., 2004, vol. 5, pp. 4578–4582. [11] M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis, “Frequent substructure-based approaches for classifying chemical compounds,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 8, pp. 1036–1050, Aug. 2005. [12] K.Lakshmi and Dr. T. Meyyappan “Frequent Subgraph Mining Algorithms -A Survey And Framework For Classification” [13] Winnie W. M. Lam and Keith C. C. Chan,” Discovering Interesting Molecular Substructures for Molecular Classification,” IEEE Transactions On Nanobioscience, Vol. 9, No. 2, June 2010. [14] Maryam Kohzadi,Mohammad reza Keyvanpour, “RF_MISMOC: Improvement of MISMOC graph based classification algorithm.” 1877-7058 © 2011 Published by ElsevierLtd.doi:10.1016/j.proeng.2011.08.1012. [15] http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html [16] M.Siva Parvathi and B.Maheswari, “Minimal Dominating Functions of Corona Product Graph of a Cycle with a Complete Graph”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 4, 2013, pp. 248 - 256, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [17] László Lengyel, “The Role of Graph Transformations in Validating Domain-Specific Properties”, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 3, 2012, pp. 406 - 425, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [18] Rinal H. Doshi, Dr. Harshad B. Bhadka and Richa Mehta, “Development of Pattern Knowledge Discovery Framework using Clustering Data Mining Algorithm”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, 2013, pp. 101 - 112, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.

×