SlideShare a Scribd company logo
VINOD GUPTA SCHOOL OF MANAGEMENT, IIT KHARAGPUR




 Data Mining using Weka
A Paper on Data Mining techniques using Weka
                  software



                        MBA 2010-2012


           IT FOR BUSINESS INTELLIGENCE – TERM PAPER

             INSTRUCTOR – PROF. PRITHWIS MUKERJEE




                                                         SUBMITTED BY
                                                       SATHISHWARAN.R
                                                            10BM60079
                                                         MBA 2010-2012
Data Mining using WEKA                      2



Table of Contents
  1. INTRODUCTION ......................................................................................................................... 3
  2. CLASSIFICATION......................................................................................................................... 3
       2.1 DATA.................................................................................................................................... 3
       2.2 SCREENS .............................................................................................................................. 3
       2.3 OUTPUT ............................................................................................................................... 6
       2.4 INTERPRETATION ................................................................................................................ 7
  3. ASSOCIATION RULES ................................................................................................................. 7
       3.1 DATA.................................................................................................................................... 7
       3.2 SCREENS .............................................................................................................................. 8
       3.3 OUTPUT ............................................................................................................................. 10
       3.4 INTERPRETATION .............................................................................................................. 12
  4. REFERNCES............................................................................................................................... 12
Data Mining using WEKA       3


1. INTRODUCTION

Widespread usage of computers has made life easier for business executives. However it has led
to the proliferation of data which had made it difficult to comprehend meaning out of it. The
amount of data that is generated in the world today had made decision making difficult. Data
mining is one approach that identifies the patterns in data and helps in making decisions by
analysing this huge data ocean. Weka (Waikato Environment for Knowledge Analysis) is free
software developed at university of Waikato in New Zealand and is available under the General
Public License. The software can be used for research, education and applications. It has a GUI
interface and comprehensive set of tools for analysing data. In this paper I have worked on data
mining techniques using the Weka software.


2. CLASSIFICATION

2.1 Data

The raw data used for this analysis has been obtained from website: http://tunedit.org/ and it
has been originally gathered from census data. There are 14 original attributes (features)
include age, work class, education, education, marital status, occupation, native country, etc. It
contains continuous, binary and categorical features. I have used the data for a two-class
classification problem. The task is to discover high revenue people from the census data and
also to make sure whether the data has been classified correctly by cross validation.

Link: http://tunedit.org/repo/Data/Agnostic-vs-Prior/Training/ada_prior_train.arff

2.2 Screens

Step 1: Launch Weka
Data Mining using WEKA   4


Step 2: Click Explorer




Step 3: Click Open file
Data Mining using WEKA   5


Step 4: Data updated in Weka




Step 4: Click Cross Validation and Decision Table. Click Start
Data Mining using WEKA       6


2.3 Output

Cross-validation

       === Run information ===

       Scheme: weka.classifiers.rules.DecisionTable -X 1 -S "weka.attributeSelection.BestFirst -
       D 1 -N 5"
       Relation: ADA_Prior
       Instances: 4147
       Attributes: 15
              age
              workclass
              fnlwgt
              education
              educationNum
              maritalStatus
              occupation
              relationship
              race
              sex
              capitalGain
              capitalLoss
              hoursPerWeek
              nativeCountry
              label
       Test mode:10-fold cross-validation

       === Classifier model (full training set) ===

       Decision Table:

       Number of training instances: 4147
       Number of Rules: 130
       Non matches covered by Majority class.
              Best first.
              Start set: no attributes
              Search direction: forward
              Stale search after 5 node expansions
              Total number of subsets evaluated: 96
              Merit of best subset found: 83.82
       Evaluation (for feature selection): CV (leave one out)
       Feature set: 5, 8,11,12,15

       Time taken to build model: 0.98 seconds

       === Stratified cross-validation ===
Data Mining using WEKA        7


       === Summary ===

       Correctly Classified Instances     3461      83.4579 %
       Incorrectly Classified Instances    686      16.5421 %
       Kappa statistic              0.5073
       Mean absolute error              0.2353
       Root mean squared error             0.339
       Relative absolute error          63.0518 %
       Root relative squared error        78.4907 %
       Total Number of Instances         4147

       === Detailed Accuracy By Class ===

             TP Rate      FP Rate Precision Recall F-Measure ROC Area Class
              0.939       0.483 0.855 0.939 0.895 0.873 -1
              0.517       0.061 0.738 0.517 0.608 0.873 1
       Weighted Avg.      0.835 0.378 0.826 0.835 0.824 0.873

       === Confusion Matrix ===

            a b <-- classified as
           2929 189 | a = -1
           497 532 | b = 1

2.4 Interpretation

      There are 83.45 % correctly classified instances and 16.54 % incorrectly classified
       instances.
      Classifier accuracy is 54.73 % from the kappa statistic
      The forecast error is got from the mean absolute error is 0.339
      3461 instances have been classified correctly and 686 instances have been classified
       incorrectly.

3. ASSOCIATION RULES


3.1 Data

The data set includes votes for each of the U.S. House of Representatives Congressmen on the 16
key votes identified by the CQA. The CQA lists nine different types of votes: voted for, paired for,
and announced for (these three simplified to yea), voted against, paired against, and announced
against (these three simplified to nay), voted present, voted present to avoid conflict of interest,
and did not vote or otherwise make a position known (these three simplified to an unknown
disposition).

       Number of Instances: 435 (267 democrats, 168 republicans)
       Number of Attributes: 16 + class name = 17 (all Boolean valued)
Data Mining using WEKA   8


Attribute Information:

      Class Name: 2 (democrat, republican)
      handicapped-infants: 2 (y,n)
      water-project-cost-sharing: 2 (y,n)
      adoption-of-the-budget-resolution: 2 (y,n)
      physician-fee-freeze: 2 (y,n)
      el-salvador-aid: 2 (y,n)
      religious-groups-in-schools: 2 (y,n)
      anti-satellite-test-ban: 2 (y,n)
      aid-to-nicaraguan-contras: 2 (y,n)
      mx-missile: 2 (y,n)
      immigration: 2 (y,n)
      synfuels-corporation-cutback: 2 (y,n)
      education-spending: 2 (y,n)
      superfund-right-to-sue: 2 (y,n)
      crime: 2 (y,n)
      duty-free-exports: 2 (y,n)
      export-administration-act-south-africa: 2 (y,n)

Link: http://tunedit.org/repo/UCI/vote.arff

3.2 Screens

Step 1: Launch Weka
Data Mining using WEKA   9


Step 2: Click Explorer




Step 3: Click Open file… and choose respective file
Data Mining using WEKA   10


Step 4: Click Associate and choose Apriori




Step 5: Click Start




3.3 Output

=== Run information ===
Scheme:     weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1
Relation: vote
Instances: 435
Attributes: 17
       handicapped-infants
Data Mining using WEKA     11


      water-project-cost-sharing
      adoption-of-the-budget-resolution
      physician-fee-freeze
      el-salvador-aid
      religious-groups-in-schools
      anti-satellite-test-ban
      aid-to-nicaraguan-contras
      mx-missile
      immigration
      synfuels-corporation-cutback
      education-spending
      superfund-right-to-sue
      crime
      duty-free-exports
      export-administration-act-south-africa
      Class
=== Associator model (full training set) ===

Apriori
=======

Minimum support: 0.45 (196 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 11

Generated sets of large itemsets:

Size of set of large itemsets L(1): 20
Size of set of large itemsets L(2): 17
Size of set of large itemsets L(3): 6
Size of set of large itemsets L(4): 1

Best rules found:

1. adoption-of-the-budget-resolution=y physician-fee-freeze=n 219 ==> Class=democrat 219
conf:(1)
2. adoption-of-the-budget-resolution=y physician-fee-freeze=n aid-to-nicaraguan-contras=y
198 ==> Class=democrat 198 conf:(1)
3. physician-fee-freeze=n aid-to-nicaraguan-contras=y 211 ==> Class=democrat 210 conf:(1)
4. physician-fee-freeze=n education-spending=n 202 ==> Class=democrat 201 conf:(1)
5. physician-fee-freeze=n 247 ==> Class=democrat 245 conf:(0.99)
6. el-salvador-aid=n Class=democrat 200 ==> aid-to-nicaraguan-contras=y 197 conf:(0.99)
7. el-salvador-aid=n 208 ==> aid-to-nicaraguan-contras=y 204 conf:(0.98)
8. adoption-of-the-budget-resolution=y aid-to-nicaraguan-contras=y Class=democrat 203 ==>
physician-fee-freeze=n 198 conf:(0.98)
9. el-salvador-aid=n aid-to-nicaraguan-contras=y 204 ==> Class=democrat 197 conf:(0.97)
Data Mining using WEKA     12


10. aid-to-nicaraguan-contras=y Class=democrat 218 ==> physician-fee-freeze=n 210
conf:(0.96)

3.4 Interpretation

Association rules have been formed by apriori association as they can be seen from the output.

4. REFERENCES:

      Book: Data Mining – Practical Machine Learning Tools and Techniques, Ian H. Witten,
       Eibe Frank, Mark A. Hall

      http://www.cs.waikato.ac.nz/ml/weka/

      http://www.tunedit.org/repo/Data/Agnostic-vs-Prior/Training/ada_prior_train.arff

      http://tunedit.org/repo/UCI/vote.arff

More Related Content

What's hot

Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
Tien-Yang (Aiden) Wu
 
Temporal based Recommendation System
Temporal based Recommendation SystemTemporal based Recommendation System
Temporal based Recommendation System
Nurfadhlina Mohd Sharef
 
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationFairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Cataldo Musto
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Girish Khanzode
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
Robin Reni
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems
Falitokiniaina Rabearison
 
Fair Recommender Systems
Fair Recommender Systems Fair Recommender Systems
Fair Recommender Systems
Sharmistha Chatterjee
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineNYC Predictive Analytics
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
James Kirk
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
hktripathy
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
 
Content based filtering
Content based filteringContent based filtering
Content based filtering
Bendito Freitas Ribeiro
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
Xavier Ochoa
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
Jarin Tasnim Khan
 
Recommender system
Recommender systemRecommender system
Recommender system
Nilotpal Pramanik
 
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NFDatabase Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Oum Saokosal
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsLei Guo
 
Functional dependency
Functional dependencyFunctional dependency
Functional dependency
Sakshi Jaiswal
 

What's hot (20)

Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Temporal based Recommendation System
Temporal based Recommendation SystemTemporal based Recommendation System
Temporal based Recommendation System
 
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationFairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems
 
Fair Recommender Systems
Fair Recommender Systems Fair Recommender Systems
Fair Recommender Systems
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Content based filtering
Content based filteringContent based filtering
Content based filtering
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NFDatabase Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 
Functional dependency
Functional dependencyFunctional dependency
Functional dependency
 

Similar to Weka project - Classification & Association Rule Generation

Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)Sanghun Kim
 
MS Word.doc
MS Word.docMS Word.doc
MS Word.docbutest
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
Trushita Redij
 
research paper
research paperresearch paper
research paper
Kalyan Ram
 
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Vikas Virani
 
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
IRJET Journal
 
Fraud Detection with Ensemble Learning Technique
Fraud Detection with Ensemble Learning TechniqueFraud Detection with Ensemble Learning Technique
Fraud Detection with Ensemble Learning Technique
Francesca Pappalardo
 
Barga Data Science lecture 6
Barga Data Science lecture 6Barga Data Science lecture 6
Barga Data Science lecture 6
Roger Barga
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction models
Muthu Kumaar Thangavelu
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction models
Muthu Kumaar Thangavelu
 
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET Journal
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiers
amreshkr19
 
01-pengantar.pdf
01-pengantar.pdf01-pengantar.pdf
01-pengantar.pdf
ssuseradaf5f
 
wekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdfwekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdf
Dr. Rajesh P Barnwal
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
Shashidhar Shenoy
 
A Survey on Stroke Prediction
A Survey on Stroke PredictionA Survey on Stroke Prediction
A Survey on Stroke Prediction
MohammadRakib8
 
A survey on heart stroke prediction
A survey on heart stroke predictionA survey on heart stroke prediction
A survey on heart stroke prediction
drubosaha
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using wekaPrashant Menon
 

Similar to Weka project - Classification & Association Rule Generation (20)

Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)
 
MS Word.doc
MS Word.docMS Word.doc
MS Word.doc
 
Benchmarking_ML_Tools
Benchmarking_ML_ToolsBenchmarking_ML_Tools
Benchmarking_ML_Tools
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
 
research paper
research paperresearch paper
research paper
 
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMI
 
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
 
Fraud Detection with Ensemble Learning Technique
Fraud Detection with Ensemble Learning TechniqueFraud Detection with Ensemble Learning Technique
Fraud Detection with Ensemble Learning Technique
 
Barga Data Science lecture 6
Barga Data Science lecture 6Barga Data Science lecture 6
Barga Data Science lecture 6
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction models
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction models
 
Project
ProjectProject
Project
 
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiers
 
01-pengantar.pdf
01-pengantar.pdf01-pengantar.pdf
01-pengantar.pdf
 
wekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdfwekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdf
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
A Survey on Stroke Prediction
A Survey on Stroke PredictionA Survey on Stroke Prediction
A Survey on Stroke Prediction
 
A survey on heart stroke prediction
A survey on heart stroke predictionA survey on heart stroke prediction
A survey on heart stroke prediction
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 

Recently uploaded

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 

Recently uploaded (20)

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 

Weka project - Classification & Association Rule Generation

  • 1. VINOD GUPTA SCHOOL OF MANAGEMENT, IIT KHARAGPUR Data Mining using Weka A Paper on Data Mining techniques using Weka software MBA 2010-2012 IT FOR BUSINESS INTELLIGENCE – TERM PAPER INSTRUCTOR – PROF. PRITHWIS MUKERJEE SUBMITTED BY SATHISHWARAN.R 10BM60079 MBA 2010-2012
  • 2. Data Mining using WEKA 2 Table of Contents 1. INTRODUCTION ......................................................................................................................... 3 2. CLASSIFICATION......................................................................................................................... 3 2.1 DATA.................................................................................................................................... 3 2.2 SCREENS .............................................................................................................................. 3 2.3 OUTPUT ............................................................................................................................... 6 2.4 INTERPRETATION ................................................................................................................ 7 3. ASSOCIATION RULES ................................................................................................................. 7 3.1 DATA.................................................................................................................................... 7 3.2 SCREENS .............................................................................................................................. 8 3.3 OUTPUT ............................................................................................................................. 10 3.4 INTERPRETATION .............................................................................................................. 12 4. REFERNCES............................................................................................................................... 12
  • 3. Data Mining using WEKA 3 1. INTRODUCTION Widespread usage of computers has made life easier for business executives. However it has led to the proliferation of data which had made it difficult to comprehend meaning out of it. The amount of data that is generated in the world today had made decision making difficult. Data mining is one approach that identifies the patterns in data and helps in making decisions by analysing this huge data ocean. Weka (Waikato Environment for Knowledge Analysis) is free software developed at university of Waikato in New Zealand and is available under the General Public License. The software can be used for research, education and applications. It has a GUI interface and comprehensive set of tools for analysing data. In this paper I have worked on data mining techniques using the Weka software. 2. CLASSIFICATION 2.1 Data The raw data used for this analysis has been obtained from website: http://tunedit.org/ and it has been originally gathered from census data. There are 14 original attributes (features) include age, work class, education, education, marital status, occupation, native country, etc. It contains continuous, binary and categorical features. I have used the data for a two-class classification problem. The task is to discover high revenue people from the census data and also to make sure whether the data has been classified correctly by cross validation. Link: http://tunedit.org/repo/Data/Agnostic-vs-Prior/Training/ada_prior_train.arff 2.2 Screens Step 1: Launch Weka
  • 4. Data Mining using WEKA 4 Step 2: Click Explorer Step 3: Click Open file
  • 5. Data Mining using WEKA 5 Step 4: Data updated in Weka Step 4: Click Cross Validation and Decision Table. Click Start
  • 6. Data Mining using WEKA 6 2.3 Output Cross-validation === Run information === Scheme: weka.classifiers.rules.DecisionTable -X 1 -S "weka.attributeSelection.BestFirst - D 1 -N 5" Relation: ADA_Prior Instances: 4147 Attributes: 15 age workclass fnlwgt education educationNum maritalStatus occupation relationship race sex capitalGain capitalLoss hoursPerWeek nativeCountry label Test mode:10-fold cross-validation === Classifier model (full training set) === Decision Table: Number of training instances: 4147 Number of Rules: 130 Non matches covered by Majority class. Best first. Start set: no attributes Search direction: forward Stale search after 5 node expansions Total number of subsets evaluated: 96 Merit of best subset found: 83.82 Evaluation (for feature selection): CV (leave one out) Feature set: 5, 8,11,12,15 Time taken to build model: 0.98 seconds === Stratified cross-validation ===
  • 7. Data Mining using WEKA 7 === Summary === Correctly Classified Instances 3461 83.4579 % Incorrectly Classified Instances 686 16.5421 % Kappa statistic 0.5073 Mean absolute error 0.2353 Root mean squared error 0.339 Relative absolute error 63.0518 % Root relative squared error 78.4907 % Total Number of Instances 4147 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.939 0.483 0.855 0.939 0.895 0.873 -1 0.517 0.061 0.738 0.517 0.608 0.873 1 Weighted Avg. 0.835 0.378 0.826 0.835 0.824 0.873 === Confusion Matrix === a b <-- classified as 2929 189 | a = -1 497 532 | b = 1 2.4 Interpretation  There are 83.45 % correctly classified instances and 16.54 % incorrectly classified instances.  Classifier accuracy is 54.73 % from the kappa statistic  The forecast error is got from the mean absolute error is 0.339  3461 instances have been classified correctly and 686 instances have been classified incorrectly. 3. ASSOCIATION RULES 3.1 Data The data set includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the CQA. The CQA lists nine different types of votes: voted for, paired for, and announced for (these three simplified to yea), voted against, paired against, and announced against (these three simplified to nay), voted present, voted present to avoid conflict of interest, and did not vote or otherwise make a position known (these three simplified to an unknown disposition). Number of Instances: 435 (267 democrats, 168 republicans) Number of Attributes: 16 + class name = 17 (all Boolean valued)
  • 8. Data Mining using WEKA 8 Attribute Information:  Class Name: 2 (democrat, republican)  handicapped-infants: 2 (y,n)  water-project-cost-sharing: 2 (y,n)  adoption-of-the-budget-resolution: 2 (y,n)  physician-fee-freeze: 2 (y,n)  el-salvador-aid: 2 (y,n)  religious-groups-in-schools: 2 (y,n)  anti-satellite-test-ban: 2 (y,n)  aid-to-nicaraguan-contras: 2 (y,n)  mx-missile: 2 (y,n)  immigration: 2 (y,n)  synfuels-corporation-cutback: 2 (y,n)  education-spending: 2 (y,n)  superfund-right-to-sue: 2 (y,n)  crime: 2 (y,n)  duty-free-exports: 2 (y,n)  export-administration-act-south-africa: 2 (y,n) Link: http://tunedit.org/repo/UCI/vote.arff 3.2 Screens Step 1: Launch Weka
  • 9. Data Mining using WEKA 9 Step 2: Click Explorer Step 3: Click Open file… and choose respective file
  • 10. Data Mining using WEKA 10 Step 4: Click Associate and choose Apriori Step 5: Click Start 3.3 Output === Run information === Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1 Relation: vote Instances: 435 Attributes: 17 handicapped-infants
  • 11. Data Mining using WEKA 11 water-project-cost-sharing adoption-of-the-budget-resolution physician-fee-freeze el-salvador-aid religious-groups-in-schools anti-satellite-test-ban aid-to-nicaraguan-contras mx-missile immigration synfuels-corporation-cutback education-spending superfund-right-to-sue crime duty-free-exports export-administration-act-south-africa Class === Associator model (full training set) === Apriori ======= Minimum support: 0.45 (196 instances) Minimum metric <confidence>: 0.9 Number of cycles performed: 11 Generated sets of large itemsets: Size of set of large itemsets L(1): 20 Size of set of large itemsets L(2): 17 Size of set of large itemsets L(3): 6 Size of set of large itemsets L(4): 1 Best rules found: 1. adoption-of-the-budget-resolution=y physician-fee-freeze=n 219 ==> Class=democrat 219 conf:(1) 2. adoption-of-the-budget-resolution=y physician-fee-freeze=n aid-to-nicaraguan-contras=y 198 ==> Class=democrat 198 conf:(1) 3. physician-fee-freeze=n aid-to-nicaraguan-contras=y 211 ==> Class=democrat 210 conf:(1) 4. physician-fee-freeze=n education-spending=n 202 ==> Class=democrat 201 conf:(1) 5. physician-fee-freeze=n 247 ==> Class=democrat 245 conf:(0.99) 6. el-salvador-aid=n Class=democrat 200 ==> aid-to-nicaraguan-contras=y 197 conf:(0.99) 7. el-salvador-aid=n 208 ==> aid-to-nicaraguan-contras=y 204 conf:(0.98) 8. adoption-of-the-budget-resolution=y aid-to-nicaraguan-contras=y Class=democrat 203 ==> physician-fee-freeze=n 198 conf:(0.98) 9. el-salvador-aid=n aid-to-nicaraguan-contras=y 204 ==> Class=democrat 197 conf:(0.97)
  • 12. Data Mining using WEKA 12 10. aid-to-nicaraguan-contras=y Class=democrat 218 ==> physician-fee-freeze=n 210 conf:(0.96) 3.4 Interpretation Association rules have been formed by apriori association as they can be seen from the output. 4. REFERENCES:  Book: Data Mining – Practical Machine Learning Tools and Techniques, Ian H. Witten, Eibe Frank, Mark A. Hall  http://www.cs.waikato.ac.nz/ml/weka/  http://www.tunedit.org/repo/Data/Agnostic-vs-Prior/Training/ada_prior_train.arff  http://tunedit.org/repo/UCI/vote.arff