Published on

IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Mrs. Suwarna Gothane, Dr. G.R.Bamnote / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue 5, September- October 2012, pp.458-463 An Automated Weighted Support Approach Based Associative Classification With Analytical Study For Health Disease Prediction *Mrs. Suwarna Gothane, **Dr. G.R.Bamnote *Asst Prof (CSE)ATRI, Uppal. Hyderabad **HOD(CSE)P.R.M.I.T&R, Badnera, AmravatiAbstract Association mining has seen its system throughput the doctor may decide thedevelopment right through data mining ,which medication [6].reveal implicit relationship among data attributesIt is the research topic in data mining and use for 2. ASSOCIATION CLASSIFICATION:applications such as business, science, society and This method of mining is an algorithm basedeveryone .Association rule helps in marketing, technique with support of some associationdecision making, business management ,cross constraints to repeat the data. The data available ismarketing ,catalog designing ,clustering disjointed into 2 of which is 70% of total,classification etc. Finding small database sets data and is taken as training data and the rest taken ascould be done with the help of predictive analysis another part is used for testing purpose. Thismethod. The paper enlightens the combinational technique of data mining is done on data base setscategorization of association and classification with a record of mining. For fields like medicine where a lot  Step1: with the help of training set of data createseveral patients consult doctor, but every patient an association rule set.has got different personal details not necessarily  Step2: eliminate all those rules that may causemay suffer with same disease. So the doctor may over fitting.look for a classifier, which could present all details  Step3: finally we predict the data and check forabout every patient and in future necessary accuracy and this is said to be the classificationmedications can be provided. However there have phase.been many other classification methods like One such example of data base set is:CMAR, CPAR MCAR and MMA and CBA.Some advance associative classifiers have also seendevelopment very recently with smallamendments in terms of support and confidence,thereby the accuracy. In this paper ,we proposed a Figure 1: Process Flow diagram of AssociativeHITS algorithm based on automated weight Classifiercalculation approach for weighted associative Transaction Items inclassifier. ID TransactionKeywords: classifier, Association rules, data 1 ABCDEmining, healthcare, Associative Classifiers, CBA, 2 ACECMAR, CPAR, MCAR 3 BD 4 ADE1. INTRODUCTION: A set of steps followed to take out data from 5 ABCDEthe related pattern is termed as data mining. From ahaphazard data set it is possible to obtain new data. Table 1: Transactional DatabaseThe analytical modeling approach that simply An association rule is an implication of thecombines association and classification mining form A  B , where A, B  I , where I is set oftogether [3] shows better accuracy [10]. The all items, and A  B   . The rule A  B has aclassification techniques CBA [10], CMAR [9],CPAR [8] out beat the usual classifiers C4.5, FOIL, support s in the transaction set D if s% of theRIPPER which are faster but not accurate. Associate transactions in D contain A  B .The rule A  Bclassifiers are fit to those model applications which holds in the transaction set D with confidence c if c%provide support domains in the decisions. However of transactions in D that contain A also contain B . ifthe most matched example for this is medical field the threshold point is crossed in terms of confidencewhere in the data for each patient is required to be then the association rules could be determined. Thusstored, with the help of which the system predicts thediseases likely to be upsetting the patient. With the 458 | P a g e
  2. 2. Mrs. Suwarna Gothane, Dr. G.R.Bamnote / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue 5, September- October 2012, pp.458-463the determined rules form a confidence frame with efficient and friendly for end user. Whenever anythe help of high strength rules. amendments need to be done in a tree they can be On a particular set of domains the AC is made without disturbing the other attributes.performed. A tuple is a collection of m attributes a1 , a2 ,...., am and consists of another special 3. ADVANCEMENTS IN CAR RULEpredictive Attribute (Class Label). However these GENERATION:attributes can be categorized depending action The accuracy of the classification howevermethods. Association rule mining is quite different depends on the rules disguised in the classification.tofrom AC but still undergo following similarities: overcome CARs rules inaccuracy in some cases, ai. Attribute which is pair (attribute, value), is used new advanced ARM in association with classifiers in place of Item. For example (BMI, 40) is an has been developed. This new advanced technique attribute in Table 2 provides high accuracy and also improves forecastii. Attribute set is equivalent to Itemset for example capabilities. ((Age, old), (BP, high)). 3.1. AN ASSOCIATIVE CLASSIFIER BASEDiii. Support count of Attribute ( Ai , vi ) is number of ON POSITIVE AND NEGATIVE rows that matches Attribute in database. APPROACH:iv. Support count of Attribute set Negative association rule mining and ( Ai , vi ),...., ( Am , vm ) is number of rows that associative classifiers are two relatively new domains of research, as the new associative amplifiers that match Attribute set in data base. take advantage of it. The positive association rule ofv. An Attribute ( Ai , vi ) passes the minsup entry if the form X  Y to X  Y , X  Y and support count ( Ai , vi )  min sup . X  Y with the meaning X is for presence and Heart X is for absence. Based on correlation analysis the Recor Ag Smok Hypertensi Disea algorithm uses support confidence in its place of d ID e es on BMI se using support-confidence framework in the 1 42 YES YES 40 YES association rule generation. Correlation coefficient measure is added to support confidence framework as 2 62 YES NO 28 NO it measures the strength of linear relationship 3 55 NO YES 40 YES between a pair of two variables. For two variables X con( X , Y ) and Y it is given by   4 62 YES YES 50 YES , where 5 45 NO YES 30 NO  x,  y con( X , Y ) represents the covariance of two variables and  x stand for standard difference. Table 2: Sample Database for heart An Attribute set (( Ai , vi )....( Am , vm )) passes The range of values for  is between -1 to +1,when the min sup entry if support count it is +1 the variables are completely correlated, if it is -1 the variables are completely independent then  (( Ai , vi ), ( Ai 1 , vi 1 )....( Aj , v j ))  min sup . equals to 0.when positive and negative rules are usedvii. CAR Rules are of form for classification in UCI data sets hopeful results will (( Ai , vi ), ( Ai 1 , vi 1 )....( A j , v j ))  c where c obtain. Negative association rules are efficient to take out hidden knowledge. And if they are only used for  Class-Label. Where Left hand side is itemset classification, accuracy decreases. and right hand sigh is class. And set of all attribute and class label together ie ((Ai, vi), 3.2. TEMPORAL ASSOCIATIVE …,(Aj, vj),c) is called rule attribute. CLASSIFIERS:viii. Support count of rule attribute As data is not always static in nature, it (( Ai , vi ), ( Ai 1 , vi 1 )....( A j , v j ), c) is number changes with time, so adopting chronological of rows that matches item in database. dimension to this will give more practical approach Rule attribute and yields much better results as the purpose is to provide the pattern or relationship among the items in (( Ai , vi ), ( Ai 1 , vi 1 )....( A j , v j ), c) passes the time domain. For example rather than the basic min sup entry if support count of association rule of bread  butter mining (( Ai , vi ), ( Ai 1 , vi 1 )....( A j , v j ), c)  min sup from the chronological data we can get that the An important subset of rules called classassociation rules (CARs) are used for classification support of bread  butter raises to 50%purpose since their right hand side is used for during 7 pm to 10 pm everyday [3],as These rules areattributes. Its simplicity and accuracy makes it more informative they are used to make a strategic 459 | P a g e
  3. 3. Mrs. Suwarna Gothane, Dr. G.R.Bamnote / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue 5, September- October 2012, pp.458-463decision making. Time is an important aspect in “Income [20-50k$]  Age [20 -30]”, and so ontemporal database. [ZC08]. As the record belongs to only one of the set1. Scientific, medical, dynamic systems, computer results in sharp boundary problem which gives rise to network traffic, web logs, markets, sales, the notion of fuzzy association rules (FAR).The transactions, machine/device performance, semantics of a fuzzy association rule is richer and weather/climate, telephone calls are examples of natural language nature, which are deemed attractive. time ordered data. For example, “low-quantity Fruit  normal -2. The volume of dynamic time-tagged data is consumption Meat” and “medium Income  therefore growing, and continuing to grow, young Age” are fuzzy association rules, where X’s3. as the monitoring and tracking of real-world and Y’s are fuzzy sets with linguistic terms (i.e., low, events frequently require frequent normal, medium, and young).An associative measurements.3.Data mining methods require classification based on fuzzy association rules some alteration to handle special temporal (namely CFAR) is proposed to overcome the “sharp relationships (“before”, “after”, “during”, “in boundary” problem for quantitative domains. Fuzzy summer”, “whenever X happens”). rules are found to be useful for prediction modeling4. Time-ordered data lend themselves to prediction system in medical domain as most of the attributes – what is the likelihood of an event, given the are quantitative in nature hence fuzzy logic is used to previous history of events? (e.g., hurricane deal with sharp boundary problems. tracking, disease epidemics) 3.3.1 Defining Support and confidence measure: New5. Time-ordered data often link certain events to formulae of support and confidence for fuzzy specific patterns of temporal behavior (e.g., classification rule F  C are as follows: network intrusion breaks INS).The new type of Sum of membership values of antecedent with class label C sup port ( F  C )  AC called Temporal Associative Classifier is Total No. of Records in the Database being proposed in [3] to deal with above such state. CBA, CMAR and CPAR are customized Sum of membership values of antecedent with class label C confidence( F  C )  Sum of membership values of antecedent for all class label with temporal measurement and proposed TCBA, TCMAR and TCPAR. To compare the classifying accuracy and implementation time of 3.4 WEIGHTED ASSOCIATIVE CLASSIFIERS the three algorithms using temporally Based on the unlike features weights are customized data set of UCI machine learning selected based on this classifier. Every attribute data sets an experiment has performed and varies in terms of importance. it also important to conclusions are: know that with the capabilities of predicting thei. TCPAR performs better than TCMAR and weights may be altered. Weighted associative TCBA as it is time consuming for smaller classifiers consist of training dataset support values but improves in run- time T  r1,r2, r3. ri with set of weight associated performance as the support increases. with each attribute, attribute value pair. Each withii. Using data set the accuracy is considered for record is a set of attribute value and a weight wi each algorithm. The average accuracy of TCPAR attached to each attribute of tuple / record. Aweighted is found little better than TCMAR. framework has record as a triple {ai, vi, wi} whereiii. The temporal complement of all the three attribute ai is having value vi and weight wi, associative classifiers has shown better 0<wj<=1. Thus with the help of weights one can classification accuracy as compare to the non- easily find out its predicting ability. With this temporal associative classifier. Time-ordered weighted rules like “medium Income young Age” , data lend themselves to prediction like what is “{(Age,”>62”), (BMI,“45”), (Boold_pressur,“95- the likelihood of an event e.g., (hurricane 135”)},  Heart Disease,(Income[20,000- tracking, disease epidemics). The temporal data 30,000]Age[20 -30]) could become the criteria of is useful in predicting the disease in different age determination. Weights of data as per table 2 are group. recorded in table 4 with the help of weights mentioned in the table 5 for predicting attributes.3.3. ASSOCIATIVE CLASSIFIER USING FUZZY: Recor Association Rule: The quantitative attributes dare one of preprocessing step in classification. for the Recor Ag Smok Hypertensi BM Weigdata which is associated with quantitative domains d ID e es on I htsuch as income, age, price, etc., in order to apply the 1 42 YES YES 40 0.6Apriori-type method association rule mining 2 62 YES NO 28 0.42requirements to separation the domains. Thus, adiscovered rule X  Y reflects association 3 55 NO YES 40 0.52between interval values of data items. Examples of 4 62 YES YES 50 0.67such rules are “Fruit [1-5kg]  Meat [520$]”, - 5 45 NO YES 30 0.45 460 | P a g e
  4. 4. Mrs. Suwarna Gothane, Dr. G.R.Bamnote / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue 5, September- October 2012, pp.458-463Table 4: Relational Database with record weight weights represent the potential of transactions toHere we measure the record weight using fallowing contain high-value items. A transaction with few |I | items may still be a good hub if all component items w i are top ranked. Conversely, a transaction with manyrecordWeight  i 1 ordinary items may have a low hub weight. |I|Here W-support - A New Measurement: | I | is total number of items in record Item set estimation by support in classical i is an item in record association rule mining [1] is based on counting. In wi is weight of the item i this section, we will introduce a link-based measure called w-support and formulate association rule3.4.1 Measuring weights using HITS algorithm mining in terms of this new concept. Ranking Transactions with HITS The previous section has established the applicationA database of transactions can be depicted as a of the HITS algorithm [3] to the ranking of thebipartite graph without loss of information. Let transactions. As the iteration converges, the authorityD  {T1 , T2 ,...Tm } be a list of transactions and weight auth(i )   hub(T ) represents theI  {i1 , i2 , } be the equivalent set of items. T :iT “significance” of an item I, accordingly, weThen, clearly D is equivalent to the bipartite graph generalize the formula of auth(i ) to depict theG  ( D, I , E ) where significance of an arbitrary item set, as the followingE  {(T , i) : i  T , T  D, i  I } definition shows: Definition 1: The w-support of an item set X is defined as  hub(T ) w sup p( X )  T : X T T D ....(2)  hub(T ) T :T D Where hub(T ) is the hub weight ofFig1: The bipartite graph illustration of a database (a)Database (b) Bipartite graph transaction T. An item set is said to be significant if its w-support is larger than a user specified valueExample 1: Consider the database shown in Fig. 1a. Observe that replacing all hub(T ) with 1 on theIt can be consistently represented as a bipartite graph, right hand side of (2) gives supp(X). Therefore, w-as shown in Fig. 1b. The graph representation of the support can be regarded as a generalization oftransaction database is inspiring. It gives us the idea support, which takes the weights of transactions intoof applying link-based ranking models to the account. These weights are not determined byestimation of transactions. In this bipartite graph, the assigning values to items but the global link structuresupport of an item i is proportional to its degree, of the database. This is why we call w-support linkwhich shows again that the classical support does not based. Moreover, we claim that w-support is moreconsider the variation between transactions. reasonable than counting-based measurement.However, it is critical to have different weights fordifferent transactions in order to return their different ID Symptoms Weightimportance. The assessment of item sets should be 1 Age  40 0.1derivative from these weights. Here comes thequestion of how to acquire weights in a database with 2 40  Age  58 0.2only binary attributes. Intuitively, a good transaction, 3 Age  58 0.3which is highly weighted, should contain many gooditems; at the same time, a good item should be 4 Smokes  yes 0.8contained by many good transactions. The 5 Smokes  no 0.7reinforcing relationship of transactions and items is 6 Hypertension  yes 0.6just like the relationship between hubs and authoritiesin the HITS model [3]. Regarding the transactions as 7 Hypertension  no 0.5“pure” hubs and the items as “pure” authorities, we 8 BMI  25 0.1can apply HITS to this bipartite graph. The followingequations are used in iterations: 9 26  BMI  30 0.3 auth(i )   hub(T ), hub(T )   auth(i )....(1) 10 31  BMI  40 0.5 T :iT i:iT 11 BMI  40 0.8 Table 5: weights calculated using anticipatedWhen the HITS model finally converges, the hubweights of all transactions are obtained. These algorithm 461 | P a g e
  5. 5. Mrs. Suwarna Gothane, Dr. G.R.Bamnote / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue 5, September- October 2012, pp.458-4633.4.2 Defining Support and confidence measure: CONCLUSION: New formulae of support and confidence for This advanced AC method could be appliedclassification rule X  class _ Label , where X is in real time scenario to get more accurate results.set of weighted items, is as follows: Weighted This needs lot of prediction to be done based on itsSupport: Weighted support WSP of rule capabilities which could be improved.It find it major application in the field of medical where every data X  class _ Label , where X is set of non empty has an associated weight. The proposed HITssubsets of attribute value set, is part of weight of the algorithm based weight measurement model isrecord that contain above attribute-value set significantly improving the quality of classifier.comparative to the weight of all transactions. |X |v1   weight (ri ) REFERENCES: i 1 [1]. SunitaSoni ,JyothiPillai,O.P. Vyas An |n| Associative Classifier Using Weightedv 2   weight (ri ) Association Rule2009 International k 1 Symposium on Innovations in natural v1 Computing, World Congress on Nature &WSP( X  Class _ Label )  v2 BiologicallyThe classification accuracy development for [2]. Zuoliang Chen, Guoqing Chen BUILDINGWeighted associative classifier with proposed weight AN ASSOCIATIVE CLASSIFIER BASEDmeasurement approach can be observable in fig 2 and ON FUZZY ASSOCIATION RULESfig 3. International Journal of ComputationalFig 2: A line chart demonstration of classification Intelligence Systems, Vol.1, No. 3 (August,accuracy differences between Traditional pre 2008), 262 - 273assigned weighted associative classifier(PAWS- [3]. RanjanaVyas, Lokesh Kumar Sharma, OmAssociative classifier) and proposed automated Prakashvyas, Simon ScheiderAssocitiveweighted support associative classifier(AWS- Classifiers for Predictive analytics:Associative Classifier) Comparative Performance Study, second UKSIM European Symposium on Computer Modeling and Simulation 2008. [4]. E.RamarajN.VenkatesanPositive and Negative Association Rule Analysis in Health Care Database ,IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.10, October 2008,325- 330. [5]. Khan, M.S. Muyeba, M. Coenen, F A Weighted Utility Framework for Mining Association Rules, Symposium Computer Modeling and Simulation, 2008. EMS 08. Second UKSIM European , page(s): 87-92. [6]. FadiThabtah, A review of associative4. REFINING SUPPORT AND classification mining, The Knowledge Engineering Review, Volume 22, Issue 1 CONFIDENCE MEASURES TO (March 2007), Pages 37-65, 2007. VALIDATE DOWNWARD CLOSURE [7]. LuizaAntonie, University of Alberta, PROPERTY: Advancing Associative Classifiers - The downward closure property is the key Challenges and Solutions, Workshop onpart of Apriori states that any super set Machine Learning, Theory, Applications,can’t be frequent unless and until its itemset isn’t Experiences 2007frequent. The itemsets that are already found to be [8]. Feng Tao, FionnMurtagh and Mohsen Farid.frequent are added with new items based on the Weighted Association Rule Mining usingalgorithm. However changes in support and Weighted Support and Significanceconfidence shall not show its result on this property Framework Proceedings of the ninth ACMand also AC associated with advanced rule SIGKDD international conference ondeveloper. The terms support and confidence are to Knowledge discovery and data mining 2003,be replaced with weighted support and weighted Pages:661-666 Year of Publication: 2003confidence respectively in WAC which elicits that [9]. Yin, X. & Han, J. CPAR: Classificationweighted support helps maintain weighted closure based on predictive association rule. Inproperty. Proceedings of the SIAM International 462 | P a g e
  6. 6. Mrs. Suwarna Gothane, Dr. G.R.Bamnote / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue 5, September- October 2012, pp.458-463 Conference on Data Mining. San Francisco, CA: SIAM Press, 2003,pp. 369-376. [10]. W. Li, J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple class association rules. In ICDM01, pp. 369-376, San Jose, CA, Nov.2001. [11]. Liu,B Hsu. W. Ma, Integrating Clasification and association rule mining .Proceeding of the KDD, 1998(CBA) pp 80-86. [12]. Cláudia M. Antunes , and Arlindo L. Oliveira Temporal Data Mining: an overview. [13]. Antonie, M. &Zaïane, O. An associative classifier based on positive and negative rules. In Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2004,pp 64-69. 463 | P a g e