Continue on your data mining adventure by doing some classifications
1. Continue on your data mining adventure by doing some
classifications
1. Conintue with your original dataset. Make any changes that
were suggested and link or add to the original document. All
infor should be accessible from previous portions of the
project.
2. Utilizing technology classify on one of your categorical
variables
(a) Use a simple decision tree to classify your data to a
categorical variable. Create a visualization of the decision tree.
Make sure to produce a confusion matrix.
(b) Repeat your decision tree but use a cross validation
technique to test the accuracy. Examine variable importance and
be certain to comment on the most important variables.
(c) Examine the importance of each feature using a chi-
square statistic or gain ratio. Create a visualization. Does this
follow what your decision trees showed?
3. Write your report!
(a) Include all items requested above. Include graphs and
text about each.
(b) Discuss the cross validation process chosen. Discuss
whether each model is overfit and how you might tell.
(c) Discussion confusion matrix and what it might mean
for making certain predictions in your project.
2. The report will be graded by the following criteria:
Statistical analysis - 30 points. The statistical tests are all
provided.
Graphical Representations - 30 points. The requested graphical
displays are made and included in report.
Continuation - 15 points. The report is a continuation of the
previous report. This may include links or just additional to Part
1. In any case the introduction to your data should be available
and any necessary fixes made.
Interpretations - 15 points. The results of the statistical
analysis are clearly explained and interpreted in the context of
the problem. The conclusions accurately reflect the analysis and
are well supported.
Writing quality - 10 points. The paper is readable and clearly
written. There are few, if any, grammatical or spelling errors
and they do not interfere with the clarity of the paper.
Numbering on this document is not used in the report in anyway