Data Mining Using WEKA         Submitted to    Prof. Prithwis Mukerjee        Submitted By       Shikha Jayaswal        19...
Table of ContentsObjective ..................................................................................................
List of Figures:Figure 1: Weka GUI Chooser...................................................................................
ObjectiveExhibit the use of WEKA in performing the following data mining tasks:    •   Linear Regression.    •   Clusterin...
Figure 2: Weka ExplorerLoading Datasets:The file types supported are:    •   Arff data files    •   C4.5 data files    •  ...
Click “Open file..” >> select the file to be loaded and open it.Figure 3: Load Dataset
Linear RegressionModelSteps for creating the regression model:   1. Click on the Classify tab.   2. Click on the Choose bu...
ClusteringModelSteps for creating the clustering model:    1. Click on the Cluster tab.    2. Click on the Choose button, ...
Interpreting the OutputThe Clustered Instances:   Cluster      Instances      0           7(16%)      1          14(31%)  ...
Upcoming SlideShare
Loading in …5
×

Weka

596 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
596
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
29
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Weka

  1. 1. Data Mining Using WEKA Submitted to Prof. Prithwis Mukerjee Submitted By Shikha Jayaswal 19th April, 2012
  2. 2. Table of ContentsObjective ................................................................................................................................................4WEKA......................................................................................................................................................4 Running WEKA....................................................................................................................................4Loading Datasets:...................................................................................................................................5Linear Regression...................................................................................................................................7 Model.................................................................................................................................................7 Interpreting the Output......................................................................................................................7Clustering................................................................................................................................................8 Model.................................................................................................................................................8 Interpreting the Output......................................................................................................................9
  3. 3. List of Figures:Figure 1: Weka GUI Chooser...................................................................................................................4Figure 2: Weka Explorer.........................................................................................................................5Figure 3: Load Dataset............................................................................................................................6Figure 4: Linear Regression.....................................................................................................................7
  4. 4. ObjectiveExhibit the use of WEKA in performing the following data mining tasks: • Linear Regression. • ClusteringWEKAWeka is a data mining tool developed at the University of Waikato. It uses GNU general publiclicenses and is freely available. It is implemented in the java programming language and has GUI forloading data, running analysis and producing visualizations.The software could be downloaded from: http://www.cs.waikato.ac.nz/~ml/weka/The version being used in the current analysis is 3.6.6.Running WEKAThe following Weka GUI Chooser pops up on running weka:Figure 1: Weka GUI ChooserThe Explorer button leads to the Weka Explorer window through which data could be loaded and beused further for analysis.
  5. 5. Figure 2: Weka ExplorerLoading Datasets:The file types supported are: • Arff data files • C4.5 data files • Csv data files • Libsvm data file • Svm ligt data files • Binary serialized data files • Xrff data filesThe data file being used for the study is:
  6. 6. Click “Open file..” >> select the file to be loaded and open it.Figure 3: Load Dataset
  7. 7. Linear RegressionModelSteps for creating the regression model: 1. Click on the Classify tab. 2. Click on the Choose button, in the window that opens up expand classifiers and then functions, select LinearRegression. 3. Click on the LinearRegression text area, one could see GenericObjectEditor pop-up, in the dropdown attributeSelectionMethod select No Attribute Selection, Click on OK. 4. Check Use Training Set to use the loaded dataset. 5. In the dropdown select Price/Unit as the dependent variable and click on the Start button. Figure 4: Linear RegressionInterpreting the OutputPrice/Unit = -0.0012 * BTU/Hr + 0.5806 * Weight lbs + 3.7411 * EER + 0 * Unit volume -1.2524 * Region -2.1025 * Type + 24.8058
  8. 8. ClusteringModelSteps for creating the clustering model: 1. Click on the Cluster tab. 2. Click on the Choose button, in the window that opens up expand clusterers, select EM. 3. Click on the EM text area, one could see GenericObjectEditor pop-up, Fill in the cluster attributes, Click on OK. a. -V Verbose. b. -N The number of clusters to generate. If omitted, EM will use cross validation to select the number of clusters automatically. c. -I Terminate after this many iterations if EM has not converged. d. -S Specify random number seed. e. -M Set the minimum allowable standard deviation for normal density calculation. 4. Check Use Training Set to use the loaded dataset and click on the Start button.
  9. 9. Interpreting the OutputThe Clustered Instances: Cluster Instances 0 7(16%) 1 14(31%) 2 10(22%) 3 3(%) 4 11(24%)The attributes of the clusters are: Cluster 0 1 2 3 4 Attribute 0.16 0.3 0.2 0.07 0.27 mean 34.1022 32.5883 39.1963 38.0867 30.9768 Price/Unit std. dev. 4.1176 1.2413 2.2264 1.0193 2.8369 mean 912.8122 499.9553 496.4343 856.6667 347.0964 BTU/Hr std. dev. 105.4301 159.6201 178.5667 57.9272 140.3392 mean 10.4966 5.6066 5.6444 9.5967 3.9301 Weight lbs. std. dev. 1.3785 1.848 2.0181 0.7312 1.559 mean 3.3643 3.9673 4.9873 4.8533 4.4754 EER std. dev 0.2773 0.3885 0.3347 0.1586 0.3313 mean 180985.9 129223.9 71417.94 74000 92473.04 Unit Volume std. dev 239037.4 135545.2 45108.85 44639.3 85150.53 mean 3 3.1226 4 5 4.8882 Region std. dev 0.8848 0.4794 0 0.8848 0.365 mean 1.1427 2 2 1.3333 2 Type std. dev 0.3497 0.3866 0.3866 0.4714 0.3866

×