Weka project - Classification & Association Rule Generation
Upcoming SlideShare
Loading in...5
×
 

Weka project - Classification & Association Rule Generation

on

  • 12,308 views

Weka project - Classification & Association Rule Generation

Weka project - Classification & Association Rule Generation

Statistics

Views

Total Views
12,308
Slideshare-icon Views on SlideShare
12,308
Embed Views
0

Actions

Likes
1
Downloads
423
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Hi,
    Can I please know how do I bring the Class=democrat 200 in rule 6 and Class=democrat 203 in rule 8 from left hand side of the rule to Right hand side of the rule in Page No:11.

    Thank you
    Regards
    Edwin
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Weka project - Classification & Association Rule Generation Weka project - Classification & Association Rule Generation Document Transcript

    • VINOD GUPTA SCHOOL OF MANAGEMENT, IIT KHARAGPUR Data Mining using WekaA Paper on Data Mining techniques using Weka software MBA 2010-2012 IT FOR BUSINESS INTELLIGENCE – TERM PAPER INSTRUCTOR – PROF. PRITHWIS MUKERJEE SUBMITTED BY SATHISHWARAN.R 10BM60079 MBA 2010-2012
    • Data Mining using WEKA 2Table of Contents 1. INTRODUCTION ......................................................................................................................... 3 2. CLASSIFICATION......................................................................................................................... 3 2.1 DATA.................................................................................................................................... 3 2.2 SCREENS .............................................................................................................................. 3 2.3 OUTPUT ............................................................................................................................... 6 2.4 INTERPRETATION ................................................................................................................ 7 3. ASSOCIATION RULES ................................................................................................................. 7 3.1 DATA.................................................................................................................................... 7 3.2 SCREENS .............................................................................................................................. 8 3.3 OUTPUT ............................................................................................................................. 10 3.4 INTERPRETATION .............................................................................................................. 12 4. REFERNCES............................................................................................................................... 12
    • Data Mining using WEKA 31. INTRODUCTIONWidespread usage of computers has made life easier for business executives. However it has ledto the proliferation of data which had made it difficult to comprehend meaning out of it. Theamount of data that is generated in the world today had made decision making difficult. Datamining is one approach that identifies the patterns in data and helps in making decisions byanalysing this huge data ocean. Weka (Waikato Environment for Knowledge Analysis) is freesoftware developed at university of Waikato in New Zealand and is available under the GeneralPublic License. The software can be used for research, education and applications. It has a GUIinterface and comprehensive set of tools for analysing data. In this paper I have worked on datamining techniques using the Weka software.2. CLASSIFICATION2.1 DataThe raw data used for this analysis has been obtained from website: http://tunedit.org/ and ithas been originally gathered from census data. There are 14 original attributes (features)include age, work class, education, education, marital status, occupation, native country, etc. Itcontains continuous, binary and categorical features. I have used the data for a two-classclassification problem. The task is to discover high revenue people from the census data andalso to make sure whether the data has been classified correctly by cross validation.Link: http://tunedit.org/repo/Data/Agnostic-vs-Prior/Training/ada_prior_train.arff2.2 ScreensStep 1: Launch Weka
    • Data Mining using WEKA 4Step 2: Click ExplorerStep 3: Click Open file
    • Data Mining using WEKA 5Step 4: Data updated in WekaStep 4: Click Cross Validation and Decision Table. Click Start
    • Data Mining using WEKA 62.3 OutputCross-validation === Run information === Scheme: weka.classifiers.rules.DecisionTable -X 1 -S "weka.attributeSelection.BestFirst - D 1 -N 5" Relation: ADA_Prior Instances: 4147 Attributes: 15 age workclass fnlwgt education educationNum maritalStatus occupation relationship race sex capitalGain capitalLoss hoursPerWeek nativeCountry label Test mode:10-fold cross-validation === Classifier model (full training set) === Decision Table: Number of training instances: 4147 Number of Rules: 130 Non matches covered by Majority class. Best first. Start set: no attributes Search direction: forward Stale search after 5 node expansions Total number of subsets evaluated: 96 Merit of best subset found: 83.82 Evaluation (for feature selection): CV (leave one out) Feature set: 5, 8,11,12,15 Time taken to build model: 0.98 seconds === Stratified cross-validation ===
    • Data Mining using WEKA 7 === Summary === Correctly Classified Instances 3461 83.4579 % Incorrectly Classified Instances 686 16.5421 % Kappa statistic 0.5073 Mean absolute error 0.2353 Root mean squared error 0.339 Relative absolute error 63.0518 % Root relative squared error 78.4907 % Total Number of Instances 4147 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.939 0.483 0.855 0.939 0.895 0.873 -1 0.517 0.061 0.738 0.517 0.608 0.873 1 Weighted Avg. 0.835 0.378 0.826 0.835 0.824 0.873 === Confusion Matrix === a b <-- classified as 2929 189 | a = -1 497 532 | b = 12.4 Interpretation  There are 83.45 % correctly classified instances and 16.54 % incorrectly classified instances.  Classifier accuracy is 54.73 % from the kappa statistic  The forecast error is got from the mean absolute error is 0.339  3461 instances have been classified correctly and 686 instances have been classified incorrectly.3. ASSOCIATION RULES3.1 DataThe data set includes votes for each of the U.S. House of Representatives Congressmen on the 16key votes identified by the CQA. The CQA lists nine different types of votes: voted for, paired for,and announced for (these three simplified to yea), voted against, paired against, and announcedagainst (these three simplified to nay), voted present, voted present to avoid conflict of interest,and did not vote or otherwise make a position known (these three simplified to an unknowndisposition). Number of Instances: 435 (267 democrats, 168 republicans) Number of Attributes: 16 + class name = 17 (all Boolean valued)
    • Data Mining using WEKA 8Attribute Information:  Class Name: 2 (democrat, republican)  handicapped-infants: 2 (y,n)  water-project-cost-sharing: 2 (y,n)  adoption-of-the-budget-resolution: 2 (y,n)  physician-fee-freeze: 2 (y,n)  el-salvador-aid: 2 (y,n)  religious-groups-in-schools: 2 (y,n)  anti-satellite-test-ban: 2 (y,n)  aid-to-nicaraguan-contras: 2 (y,n)  mx-missile: 2 (y,n)  immigration: 2 (y,n)  synfuels-corporation-cutback: 2 (y,n)  education-spending: 2 (y,n)  superfund-right-to-sue: 2 (y,n)  crime: 2 (y,n)  duty-free-exports: 2 (y,n)  export-administration-act-south-africa: 2 (y,n)Link: http://tunedit.org/repo/UCI/vote.arff3.2 ScreensStep 1: Launch Weka
    • Data Mining using WEKA 9Step 2: Click ExplorerStep 3: Click Open file… and choose respective file
    • Data Mining using WEKA 10Step 4: Click Associate and choose AprioriStep 5: Click Start3.3 Output=== Run information ===Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1Relation: voteInstances: 435Attributes: 17 handicapped-infants
    • Data Mining using WEKA 11 water-project-cost-sharing adoption-of-the-budget-resolution physician-fee-freeze el-salvador-aid religious-groups-in-schools anti-satellite-test-ban aid-to-nicaraguan-contras mx-missile immigration synfuels-corporation-cutback education-spending superfund-right-to-sue crime duty-free-exports export-administration-act-south-africa Class=== Associator model (full training set) ===Apriori=======Minimum support: 0.45 (196 instances)Minimum metric <confidence>: 0.9Number of cycles performed: 11Generated sets of large itemsets:Size of set of large itemsets L(1): 20Size of set of large itemsets L(2): 17Size of set of large itemsets L(3): 6Size of set of large itemsets L(4): 1Best rules found:1. adoption-of-the-budget-resolution=y physician-fee-freeze=n 219 ==> Class=democrat 219conf:(1)2. adoption-of-the-budget-resolution=y physician-fee-freeze=n aid-to-nicaraguan-contras=y198 ==> Class=democrat 198 conf:(1)3. physician-fee-freeze=n aid-to-nicaraguan-contras=y 211 ==> Class=democrat 210 conf:(1)4. physician-fee-freeze=n education-spending=n 202 ==> Class=democrat 201 conf:(1)5. physician-fee-freeze=n 247 ==> Class=democrat 245 conf:(0.99)6. el-salvador-aid=n Class=democrat 200 ==> aid-to-nicaraguan-contras=y 197 conf:(0.99)7. el-salvador-aid=n 208 ==> aid-to-nicaraguan-contras=y 204 conf:(0.98)8. adoption-of-the-budget-resolution=y aid-to-nicaraguan-contras=y Class=democrat 203 ==>physician-fee-freeze=n 198 conf:(0.98)9. el-salvador-aid=n aid-to-nicaraguan-contras=y 204 ==> Class=democrat 197 conf:(0.97)
    • Data Mining using WEKA 1210. aid-to-nicaraguan-contras=y Class=democrat 218 ==> physician-fee-freeze=n 210conf:(0.96)3.4 InterpretationAssociation rules have been formed by apriori association as they can be seen from the output.4. REFERENCES:  Book: Data Mining – Practical Machine Learning Tools and Techniques, Ian H. Witten, Eibe Frank, Mark A. Hall  http://www.cs.waikato.ac.nz/ml/weka/  http://www.tunedit.org/repo/Data/Agnostic-vs-Prior/Training/ada_prior_train.arff  http://tunedit.org/repo/UCI/vote.arff