0
Data Mining Techniques  Sherif Elfayoumy, Ph.D. Associate Professor of Computer Science, UNF [email_address]
About Myself <ul><ul><li>Based out of Jacksonville, FL </li></ul></ul><ul><ul><li>Doctorate in AI and supercomputing </li>...
Agenda <ul><li>Data Mining Process </li></ul><ul><li>Major Data Mining Types </li></ul><ul><li>Association Mining </li></u...
Database Data warehouse DM Engine Testing and validation data Model Data Mining Process Source Data Project Data Relevant ...
Model Data Mining Process New data Predictions
Major Data Mining Types <ul><li>Association  </li></ul><ul><li>Classification/Prediction </li></ul><ul><li>Clustering </li...
Association Mining <ul><li>Market basket analysis </li></ul><ul><li>Brute force is intractable </li></ul><ul><li>Apriori A...
Classification/Prediction <ul><li>Decision Trees </li></ul><ul><ul><li>Iterative, recursive </li></ul></ul><ul><ul><li>Att...
Classification/Prediction <ul><li>Naïve Bayes </li></ul><ul><ul><li>Bayes theorem:  </li></ul></ul><ul><ul><li>Attribute i...
Naïve Bayes Example P(true|n) = 3/5 P(true|p) = 3/9 P(false|n) = 2/5 P(false|p) = 6/9 P(high|n) = 4/5 P(high|p) = 3/9 P(no...
Classification/Prediction <ul><li>Neural Networks </li></ul><ul><ul><li>Feed-forward, back-propagation model (aka Mulilaye...
Neural Network Example 35 28 27 31 30 34 33 29 32 2 1 9 20 41 25 23 22 15 17 3 19 18 13 14 12 4 11 7 10 8 2 1 5 6 40 39 38...
Classification/Prediction <ul><li>Model Evaluation </li></ul><ul><ul><li>Confusion matrix </li></ul></ul><ul><ul><li>Accur...
Clustering <ul><li>Algorithms </li></ul><ul><ul><li>Segmentation </li></ul></ul><ul><ul><li>Hierarchical </li></ul></ul><u...
Upcoming SlideShare
Loading in...5
×

Data Mining I.ppt

4,943

Published on

3 Comments
1 Like
Statistics
Notes
No Downloads
Views
Total Views
4,943
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
161
Comments
3
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Data Mining I.ppt"

  1. 1. Data Mining Techniques Sherif Elfayoumy, Ph.D. Associate Professor of Computer Science, UNF [email_address]
  2. 2. About Myself <ul><ul><li>Based out of Jacksonville, FL </li></ul></ul><ul><ul><li>Doctorate in AI and supercomputing </li></ul></ul><ul><ul><li>Taught data mining for 8 years </li></ul></ul><ul><ul><li>Authored more than 20 peer reviewed papers </li></ul></ul><ul><ul><li>Sr. director of architecture and R&D at IxReveal (text mining software vendor) </li></ul></ul><ul><ul><li>Co-inventor on two patents in text mining and semantic web (pending USPTO approval) </li></ul></ul><ul><ul><li>Senior member of IEEE and chair of IEEE Computer Society, Jacksonville chapter. </li></ul></ul>
  3. 3. Agenda <ul><li>Data Mining Process </li></ul><ul><li>Major Data Mining Types </li></ul><ul><li>Association Mining </li></ul><ul><li>Classification/Prediction </li></ul><ul><li>Clustering </li></ul>
  4. 4. Database Data warehouse DM Engine Testing and validation data Model Data Mining Process Source Data Project Data Relevant Data Data preprocessing and attribute selection Identify project data Training data
  5. 5. Model Data Mining Process New data Predictions
  6. 6. Major Data Mining Types <ul><li>Association </li></ul><ul><li>Classification/Prediction </li></ul><ul><li>Clustering </li></ul><ul><li>Text Mining </li></ul>
  7. 7. Association Mining <ul><li>Market basket analysis </li></ul><ul><li>Brute force is intractable </li></ul><ul><li>Apriori Algorithm </li></ul><ul><ul><li>Iterative </li></ul></ul><ul><ul><li>Huge candidate sets and frequent table scans </li></ul></ul><ul><ul><li>FP-Tree </li></ul></ul><ul><li>Interestingness measures </li></ul><ul><ul><li>Support </li></ul></ul><ul><ul><li>Confidence </li></ul></ul><ul><ul><li>Random probability and negative association </li></ul></ul>
  8. 8. Classification/Prediction <ul><li>Decision Trees </li></ul><ul><ul><li>Iterative, recursive </li></ul></ul><ul><ul><li>Attribute Relevance and Node Selection </li></ul></ul><ul><ul><li>Interpretable; easily integrateable </li></ul></ul><ul><ul><li>Non-adaptability </li></ul></ul>
  9. 9. Classification/Prediction <ul><li>Naïve Bayes </li></ul><ul><ul><li>Bayes theorem: </li></ul></ul><ul><ul><li>Attribute independence assumption </li></ul></ul><ul><ul><li>Calculates the probability of every state of each input column, given each possible state of the class column </li></ul></ul><ul><ul><li>Can be accurate, interpretable, adaptable </li></ul></ul>
  10. 10. Naïve Bayes Example P(true|n) = 3/5 P(true|p) = 3/9 P(false|n) = 2/5 P(false|p) = 6/9 P(high|n) = 4/5 P(high|p) = 3/9 P(normal|n) = 2/5 P(normal|p) = 6/9 P(hot|n) = 2/5 P(hot|p) = 2/9 P(mild|n) = 2/5 P(mild|p) = 4/9 P(cool|n) = 1/5 P(cool|p) = 3/9 P(rain|n) = 2/5 P(rain|p) = 3/9 P(overcast|n) = 0 P(overcast|p) = 4/9 P(sunny|n) = 3/5 P(sunny|p) = 2/9 windy humidity temperature outlook
  11. 11. Classification/Prediction <ul><li>Neural Networks </li></ul><ul><ul><li>Feed-forward, back-propagation model (aka Mulilayer Perceptron) </li></ul></ul><ul><ul><li>Accurate and fast predictions; noise tolerance </li></ul></ul><ul><ul><li>Non-interpretable; slow training; non-adaptable </li></ul></ul>Hidden nodes Output nodes Input nodes Output vector Input vector
  12. 12. Neural Network Example 35 28 27 31 30 34 33 29 32 2 1 9 20 41 25 23 22 15 17 3 19 18 13 14 12 4 11 7 10 8 2 1 5 6 40 39 38 37 36 26 24 16 14 7 6 10 9 13 12 8 11 4 2 1 19 18 17 16 15 5 3 B A 20 HIDDEN NODES OUTPUT NODES INPUT NODES
  13. 13. Classification/Prediction <ul><li>Model Evaluation </li></ul><ul><ul><li>Confusion matrix </li></ul></ul><ul><ul><li>Accuracy: % of correct predictions </li></ul></ul><ul><ul><li>Recall/Sensitivity: true positive rate </li></ul></ul><ul><ul><li>Precision: %of correct positive predictions </li></ul></ul><ul><ul><li>Specificity: true negative rate </li></ul></ul><ul><li>Bagging and Boosting </li></ul>TF FP N FN TP P Actual N P Prediction
  14. 14. Clustering <ul><li>Algorithms </li></ul><ul><ul><li>Segmentation </li></ul></ul><ul><ul><li>Hierarchical </li></ul></ul><ul><ul><li>Density </li></ul></ul><ul><li>K-Means </li></ul><ul><li>Expectation Maximization (EM) </li></ul><ul><li>Classification! </li></ul>p q o
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×