SlideShare a Scribd company logo
Data Mining Using WEKA



         Submitted to
    Prof. Prithwis Mukerjee


        Submitted By
       Shikha Jayaswal




        17th April, 2012
Table of Contents

Objective ................................................................................................................................................4

WEKA......................................................................................................................................................4

   Running WEKA....................................................................................................................................4

Loading Datasets:...................................................................................................................................5

Linear Regression...................................................................................................................................7

   Model.................................................................................................................................................7

   Interpreting the Output......................................................................................................................7

Clustering................................................................................................................................................8

   Model.................................................................................................................................................8

   Interpreting the Output......................................................................................................................9
List of Figures:

Figure 1: Weka GUI Chooser...................................................................................................................4

Figure 2: Weka Explorer.........................................................................................................................5

Figure 3: Load Dataset............................................................................................................................6

Figure 4: Linear Regression.....................................................................................................................7

Figure 5: Clustering.................................................................................................................................8
Objective

Exhibit the use of WEKA in performing the following data mining tasks:

    •   Linear Regression.
    •   Clustering



WEKA

Weka is a data mining tool developed at the University of Waikato. It uses GNU general public
licenses and is freely available. It is implemented in the java programming language and has GUI for
loading data, running analysis and producing visualizations.

The software could be downloaded from: http://www.cs.waikato.ac.nz/~ml/weka/
The version being used in the current analysis is 3.6.6.


Running WEKA


The following Weka GUI Chooser pops up on running weka:




Figure 1: Weka GUI Chooser




The Explorer button leads to the Weka Explorer window through which data could be loaded and be
used further for analysis.
Figure 2: Weka Explorer




Loading Datasets:

The file types supported are:

    •   Arff data files
    •   C4.5 data files
    •   Csv data files
    •   Libsvm data file
    •   Svm ligt data files
    •   Binary serialized data files
    •   Xrff data files


The data file being used for the study is:
Click “Open file..” >> select the file to be loaded and open it.




Figure 3: Load Dataset
Linear Regression
Model
Steps for creating the regression model:

   1. Click on the Classify tab.
   2. Click on the Choose button, in the window that opens up expand classifiers and then
      functions, select LinearRegression.
   3. Click on the LinearRegression text area, one could see GenericObjectEditor pop-up, in the
      dropdown attributeSelectionMethod select No Attribute Selection, Click on OK.
   4. Check Use Training Set to use the loaded dataset.
   5. In the dropdown select Price/Unit as the dependent variable and click on the Start button.




   Figure 4: Linear Regression




Interpreting the Output


Price/Unit = -0.0012 * BTU/Hr + 0.5806 * Weight lbs + 3.7411 * EER + 0 * Unit volume
             -1.2524 * Region -2.1025 * Type + 24.8058
Clustering
Model
Steps for creating the clustering model:

    1. Click on the Cluster tab.
    2. Click on the Choose button, in the window that opens up expand clusterers, select EM.
    3. Click on the EM text area, one could see GenericObjectEditor pop-up, Fill in the cluster
       attributes, Click on OK.
            a. -V Verbose.
            b. -N The number of clusters to generate. If omitted, EM will use cross validation to
                select the number of clusters automatically.
            c. -I Terminate after this many iterations if EM has not converged.
            d. -S Specify random number seed.
            e. -M Set the minimum allowable standard deviation for normal density calculation.
    4. Check Use Training Set to use the loaded dataset and click on the Start button.




Figure 5: Clustering
Interpreting the Output


The Clustered Instances:

   Cluster      Instances
      0           7(16%)
      1          14(31%)
      2          10(22%)
      3            3(%)
      4          11(24%)


The attributes of the clusters are:

 Cluster                                     0           1           2           3          4
 Attribute                                0.16         0.3         0.2        0.07       0.27
                      mean             34.1022    32.5883     39.1963     38.0867     30.9768
 Price/Unit           std. dev.         4.1176     1.2413      2.2264      1.0193      2.8369
                      mean            912.8122   499.9553    496.4343    856.6667    347.0964
 BTU/Hr               std. dev.       105.4301   159.6201    178.5667     57.9272    140.3392
                      mean             10.4966     5.6066      5.6444      9.5967      3.9301
 Weight lbs.          std. dev.         1.3785      1.848      2.0181      0.7312       1.559
                     mean               3.3643     3.9673      4.9873      4.8533      4.4754
 EER                 std. dev           0.2773     0.3885      0.3347      0.1586      0.3313
                     mean             180985.9   129223.9    71417.94       74000    92473.04
 Unit Volume         std. dev         239037.4   135545.2    45108.85     44639.3    85150.53
                     mean                    3     3.1226            4           5     4.8882
 Region              std. dev           0.8848     0.4794            0     0.8848       0.365
                     mean               1.1427           2           2     1.3333           2
 Type                std. dev           0.3497     0.3866      0.3866      0.4714      0.3866

More Related Content

Viewers also liked (6)

Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generation
 
Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka
 
Performance analysis of Data Mining algorithms in Weka
Performance analysis of Data Mining algorithms in WekaPerformance analysis of Data Mining algorithms in Weka
Performance analysis of Data Mining algorithms in Weka
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 

Similar to Weka_ITB

Sas rule based codebook generation for exploratory data analysis - wuss 2012
Sas rule based codebook generation for exploratory data analysis - wuss 2012Sas rule based codebook generation for exploratory data analysis - wuss 2012
Sas rule based codebook generation for exploratory data analysis - wuss 2012
RossBettinger
 
ContentsPreface vii1 Introduction 11.1 What .docx
ContentsPreface vii1 Introduction 11.1 What .docxContentsPreface vii1 Introduction 11.1 What .docx
ContentsPreface vii1 Introduction 11.1 What .docx
dickonsondorris
 
Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010
Pieter Van Zyl
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
HariniMS1
 

Similar to Weka_ITB (20)

Sas rule based codebook generation for exploratory data analysis - wuss 2012
Sas rule based codebook generation for exploratory data analysis - wuss 2012Sas rule based codebook generation for exploratory data analysis - wuss 2012
Sas rule based codebook generation for exploratory data analysis - wuss 2012
 
thesis
thesisthesis
thesis
 
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
 
ContentsPreface vii1 Introduction 11.1 What .docx
ContentsPreface vii1 Introduction 11.1 What .docxContentsPreface vii1 Introduction 11.1 What .docx
ContentsPreface vii1 Introduction 11.1 What .docx
 
2019 imta bouklihacene-ghouthi
2019 imta bouklihacene-ghouthi2019 imta bouklihacene-ghouthi
2019 imta bouklihacene-ghouthi
 
Report
ReportReport
Report
 
Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010
 
edc_adaptivity
edc_adaptivityedc_adaptivity
edc_adaptivity
 
Come for the software, stay for the community - How Drupal improves and evolves
Come for the software, stay for the community - How Drupal improves and evolvesCome for the software, stay for the community - How Drupal improves and evolves
Come for the software, stay for the community - How Drupal improves and evolves
 
document
documentdocument
document
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
 
An Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing UnitsAn Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing Units
 
Thesis
ThesisThesis
Thesis
 
Financial Data Mining Talk
Financial Data Mining TalkFinancial Data Mining Talk
Financial Data Mining Talk
 
AWS Cost Cheat Sheet
AWS Cost Cheat SheetAWS Cost Cheat Sheet
AWS Cost Cheat Sheet
 
data structures
data structuresdata structures
data structures
 
GE4230 Micromirror Project 2
GE4230 Micromirror Project 2GE4230 Micromirror Project 2
GE4230 Micromirror Project 2
 
ep08_11
ep08_11ep08_11
ep08_11
 
Neural Networks on Steroids
Neural Networks on SteroidsNeural Networks on Steroids
Neural Networks on Steroids
 
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingBig Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
 

Recently uploaded

anas about venice for grade 6f about venice
anas about venice for grade 6f about veniceanas about venice for grade 6f about venice
anas about venice for grade 6f about venice
anasabutalha2013
 

Recently uploaded (20)

How to Maintain Healthy Life style.pptx
How to Maintain  Healthy Life style.pptxHow to Maintain  Healthy Life style.pptx
How to Maintain Healthy Life style.pptx
 
anas about venice for grade 6f about venice
anas about venice for grade 6f about veniceanas about venice for grade 6f about venice
anas about venice for grade 6f about venice
 
HR and Employment law update: May 2024.
HR and Employment law update:  May 2024.HR and Employment law update:  May 2024.
HR and Employment law update: May 2024.
 
Global Interconnection Group Joint Venture[960] (1).pdf
Global Interconnection Group Joint Venture[960] (1).pdfGlobal Interconnection Group Joint Venture[960] (1).pdf
Global Interconnection Group Joint Venture[960] (1).pdf
 
BeMetals Presentation_May_22_2024 .pdf
BeMetals Presentation_May_22_2024   .pdfBeMetals Presentation_May_22_2024   .pdf
BeMetals Presentation_May_22_2024 .pdf
 
8 Questions B2B Commercial Teams Can Ask To Help Product Discovery
8 Questions B2B Commercial Teams Can Ask To Help Product Discovery8 Questions B2B Commercial Teams Can Ask To Help Product Discovery
8 Questions B2B Commercial Teams Can Ask To Help Product Discovery
 
sales plan presentation by mckinsey alum
sales plan presentation by mckinsey alumsales plan presentation by mckinsey alum
sales plan presentation by mckinsey alum
 
April 2024 Nostalgia Products Newsletter
April 2024 Nostalgia Products NewsletterApril 2024 Nostalgia Products Newsletter
April 2024 Nostalgia Products Newsletter
 
USA classified ads posting – best classified sites in usa.pdf
USA classified ads posting – best classified sites in usa.pdfUSA classified ads posting – best classified sites in usa.pdf
USA classified ads posting – best classified sites in usa.pdf
 
12 Conversion Rate Optimization Strategies for Ecommerce Websites.pdf
12 Conversion Rate Optimization Strategies for Ecommerce Websites.pdf12 Conversion Rate Optimization Strategies for Ecommerce Websites.pdf
12 Conversion Rate Optimization Strategies for Ecommerce Websites.pdf
 
Unlock Your TikTok Potential: Free TikTok Likes with InstBlast
Unlock Your TikTok Potential: Free TikTok Likes with InstBlastUnlock Your TikTok Potential: Free TikTok Likes with InstBlast
Unlock Your TikTok Potential: Free TikTok Likes with InstBlast
 
Unleash Data Power with EnFuse Solutions' Comprehensive Data Management Servi...
Unleash Data Power with EnFuse Solutions' Comprehensive Data Management Servi...Unleash Data Power with EnFuse Solutions' Comprehensive Data Management Servi...
Unleash Data Power with EnFuse Solutions' Comprehensive Data Management Servi...
 
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
 
The Truth About Dinesh Bafna's Situation.pdf
The Truth About Dinesh Bafna's Situation.pdfThe Truth About Dinesh Bafna's Situation.pdf
The Truth About Dinesh Bafna's Situation.pdf
 
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
 
Meaningful Technology for Humans: How Strategy Helps to Deliver Real Value fo...
Meaningful Technology for Humans: How Strategy Helps to Deliver Real Value fo...Meaningful Technology for Humans: How Strategy Helps to Deliver Real Value fo...
Meaningful Technology for Humans: How Strategy Helps to Deliver Real Value fo...
 
The-McKinsey-7S-Framework. strategic management
The-McKinsey-7S-Framework. strategic managementThe-McKinsey-7S-Framework. strategic management
The-McKinsey-7S-Framework. strategic management
 
Special Purpose Vehicle (Purpose, Formation & examples)
Special Purpose Vehicle (Purpose, Formation & examples)Special Purpose Vehicle (Purpose, Formation & examples)
Special Purpose Vehicle (Purpose, Formation & examples)
 
Using Generative AI for Content Marketing
Using Generative AI for Content MarketingUsing Generative AI for Content Marketing
Using Generative AI for Content Marketing
 
Equinox Gold Corporate Deck May 24th 2024
Equinox Gold Corporate Deck May 24th 2024Equinox Gold Corporate Deck May 24th 2024
Equinox Gold Corporate Deck May 24th 2024
 

Weka_ITB

  • 1. Data Mining Using WEKA Submitted to Prof. Prithwis Mukerjee Submitted By Shikha Jayaswal 17th April, 2012
  • 2. Table of Contents Objective ................................................................................................................................................4 WEKA......................................................................................................................................................4 Running WEKA....................................................................................................................................4 Loading Datasets:...................................................................................................................................5 Linear Regression...................................................................................................................................7 Model.................................................................................................................................................7 Interpreting the Output......................................................................................................................7 Clustering................................................................................................................................................8 Model.................................................................................................................................................8 Interpreting the Output......................................................................................................................9
  • 3. List of Figures: Figure 1: Weka GUI Chooser...................................................................................................................4 Figure 2: Weka Explorer.........................................................................................................................5 Figure 3: Load Dataset............................................................................................................................6 Figure 4: Linear Regression.....................................................................................................................7 Figure 5: Clustering.................................................................................................................................8
  • 4. Objective Exhibit the use of WEKA in performing the following data mining tasks: • Linear Regression. • Clustering WEKA Weka is a data mining tool developed at the University of Waikato. It uses GNU general public licenses and is freely available. It is implemented in the java programming language and has GUI for loading data, running analysis and producing visualizations. The software could be downloaded from: http://www.cs.waikato.ac.nz/~ml/weka/ The version being used in the current analysis is 3.6.6. Running WEKA The following Weka GUI Chooser pops up on running weka: Figure 1: Weka GUI Chooser The Explorer button leads to the Weka Explorer window through which data could be loaded and be used further for analysis.
  • 5. Figure 2: Weka Explorer Loading Datasets: The file types supported are: • Arff data files • C4.5 data files • Csv data files • Libsvm data file • Svm ligt data files • Binary serialized data files • Xrff data files The data file being used for the study is:
  • 6. Click “Open file..” >> select the file to be loaded and open it. Figure 3: Load Dataset
  • 7. Linear Regression Model Steps for creating the regression model: 1. Click on the Classify tab. 2. Click on the Choose button, in the window that opens up expand classifiers and then functions, select LinearRegression. 3. Click on the LinearRegression text area, one could see GenericObjectEditor pop-up, in the dropdown attributeSelectionMethod select No Attribute Selection, Click on OK. 4. Check Use Training Set to use the loaded dataset. 5. In the dropdown select Price/Unit as the dependent variable and click on the Start button. Figure 4: Linear Regression Interpreting the Output Price/Unit = -0.0012 * BTU/Hr + 0.5806 * Weight lbs + 3.7411 * EER + 0 * Unit volume -1.2524 * Region -2.1025 * Type + 24.8058
  • 8. Clustering Model Steps for creating the clustering model: 1. Click on the Cluster tab. 2. Click on the Choose button, in the window that opens up expand clusterers, select EM. 3. Click on the EM text area, one could see GenericObjectEditor pop-up, Fill in the cluster attributes, Click on OK. a. -V Verbose. b. -N The number of clusters to generate. If omitted, EM will use cross validation to select the number of clusters automatically. c. -I Terminate after this many iterations if EM has not converged. d. -S Specify random number seed. e. -M Set the minimum allowable standard deviation for normal density calculation. 4. Check Use Training Set to use the loaded dataset and click on the Start button. Figure 5: Clustering
  • 9. Interpreting the Output The Clustered Instances: Cluster Instances 0 7(16%) 1 14(31%) 2 10(22%) 3 3(%) 4 11(24%) The attributes of the clusters are: Cluster 0 1 2 3 4 Attribute 0.16 0.3 0.2 0.07 0.27 mean 34.1022 32.5883 39.1963 38.0867 30.9768 Price/Unit std. dev. 4.1176 1.2413 2.2264 1.0193 2.8369 mean 912.8122 499.9553 496.4343 856.6667 347.0964 BTU/Hr std. dev. 105.4301 159.6201 178.5667 57.9272 140.3392 mean 10.4966 5.6066 5.6444 9.5967 3.9301 Weight lbs. std. dev. 1.3785 1.848 2.0181 0.7312 1.559 mean 3.3643 3.9673 4.9873 4.8533 4.4754 EER std. dev 0.2773 0.3885 0.3347 0.1586 0.3313 mean 180985.9 129223.9 71417.94 74000 92473.04 Unit Volume std. dev 239037.4 135545.2 45108.85 44639.3 85150.53 mean 3 3.1226 4 5 4.8882 Region std. dev 0.8848 0.4794 0 0.8848 0.365 mean 1.1427 2 2 1.3333 2 Type std. dev 0.3497 0.3866 0.3866 0.4714 0.3866