Data Mining Final Presentation


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data Mining Final Presentation

  1. 1. CIS 435 DL- Summer 2011 Analysis of Wine data Cheri Krampert
  2. 2. Wine Data <ul><li>Wine data has been provided for analysis </li></ul><ul><ul><li>All wine attributes are numeric except for the type which is nominal and is the dependent variable </li></ul></ul><ul><ul><li>There are 178 cases classified into 3 types of wine </li></ul></ul><ul><ul><li>There are no missing attributes </li></ul></ul><ul><ul><li>A total of 13 attributes are available to classify the wine </li></ul></ul><ul><ul><li>Six data sets with variation in attributes were provided </li></ul></ul><ul><ul><li>The smallest data sets use only 4 attributes to identify the wine class </li></ul></ul>
  3. 3. Descriptive Output from PSPP
  4. 4. Analysis <ul><li>PSPP was used to run factor analysis for Principal Component analysis for covariance. </li></ul><ul><ul><li>5 components had Eigenvalues over 1 </li></ul></ul>
  5. 5. Analysis <ul><li>Knowledge flow was created in Weka to test the subsets of wine attributes provided. </li></ul><ul><li>The classifiers used were Bayes Naïve Simple, J48, Logistic and Multilayer Perceptron. </li></ul><ul><li>Cross validation 10 fold was used </li></ul><ul><li>Each set of attributes was run though the knowledge flow to determine the best set of predictive attributes and the most accurate classifier to predict wine type. </li></ul>
  6. 6. Analysis: Weka Knowledge Flow
  7. 7. Analysis: Knowledge Flow Output
  8. 8. Analysis: Knowledge Flow Output
  9. 9. Conclusion <ul><li>In general the data sets containing more attributes had a higher rate of correct classification. </li></ul><ul><li>There was little variation between Wine- all data and Wine-3 dimension: </li></ul><ul><ul><li>Naïve Bayes and J48 had the identical performance with these data sets. </li></ul></ul><ul><ul><li>The J48 pruned tree used the same attributes: Flavanoids, Color Intensity and Proline </li></ul></ul><ul><ul><li>Notable exception: Wine- 3 dimension analysis with Multilayer Perceptron Model provided the most accurate classification of all models and all data sets with 99.44% correctly classified data (only 1 incorrectly classified instance) </li></ul></ul>
  10. 10. Conclusion <ul><li>Proanthocyanins were not included in the data subsets. </li></ul><ul><li>Nonflavanoid Phenols were only included in the Wine- No Correlation subset of data and the correctly classified instances ranged from 88.76%- 91.57% </li></ul><ul><li>Given the performance differences in the Wine- all data and Wine- 3 dimensions It does not appear that either attribute contributes significantly to the wine classification. </li></ul>
  11. 11. Conclusion <ul><li>Naïve Bayes Simple and the Multilayer Perceptron classifications provided the best performance. </li></ul><ul><ul><li>Correct classifications were identical on 4 data sets </li></ul></ul><ul><ul><li>Multilayer Perceptron was more accurate on the remaining 2 data sets </li></ul></ul><ul><li>To predict wine type from attributes, I would recommend using attributes in the Wine- 3 dimensions and the Multilayer Perceptron classifier </li></ul>
  12. 12. Conclusion <ul><li>Weka explorer provides a tool to look at specific attributes including the data descriptives and visualization of the data related to the classification. </li></ul><ul><li>Weka knowledge flow allows for a consistent model to be created to efficiently compare classification outcomes using different data dimensions and classification tools to determine the best model. </li></ul>