PPT Format


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

PPT Format

  1. 1. Introduction to Data Mining <ul><li>Group Members: </li></ul><ul><li>Karim C. El-Khazen </li></ul><ul><li>Pascal Suria </li></ul><ul><li>Lin Gui </li></ul><ul><li>Philsou Lee </li></ul><ul><li>Xiaoting Niu </li></ul>
  2. 2. <ul><ul><li>Definition </li></ul></ul><ul><ul><li>General Concept </li></ul></ul><ul><ul><ul><ul><li>Foundations </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Evolution </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Applications </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Challenges </li></ul></ul></ul></ul><ul><ul><li>Algorithms </li></ul></ul><ul><ul><ul><ul><li>Classical </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Next Generations </li></ul></ul></ul></ul>Introduction to Data Mining
  3. 3. <ul><li>What is Data Mining? </li></ul><ul><ul><li>Data mining is the process for the non-trivial extraction of implicit, previously unknown and potentially useful information from data stored in repositories using pattern recognition technologies as well as statistical and mathematical methods. </li></ul></ul>Introduction to Data Mining
  4. 4. Introduction to Data Mining <ul><li>Foundations </li></ul><ul><ul><li>Massive data collection </li></ul></ul><ul><ul><li>Powerful multiprocessor computers </li></ul></ul><ul><ul><li>Data mining algorithms </li></ul></ul>
  5. 5. Introduction to Data Mining <ul><li>Evolution </li></ul>
  6. 6. Introduction to Data Mining <ul><li>Applications </li></ul><ul><ul><li>Industry </li></ul></ul><ul><ul><ul><li>Retails </li></ul></ul></ul><ul><ul><ul><li>Health maintenance group </li></ul></ul></ul><ul><ul><ul><li>Telecommunications </li></ul></ul></ul><ul><ul><ul><li>Credit card </li></ul></ul></ul><ul><ul><li>Web mining </li></ul></ul><ul><ul><li>Sports and entertainment solutions </li></ul></ul>
  7. 7. Introduction to Data Mining <ul><li>Challenges </li></ul><ul><ul><li>Ability to handle different types of data </li></ul></ul><ul><ul><li>Graceful degeneration of data mining algorithms </li></ul></ul><ul><ul><li>Valuable data mining results </li></ul></ul><ul><ul><li>Representation of data mining requests and results </li></ul></ul><ul><ul><li>Mining at different abstraction levels </li></ul></ul><ul><ul><li>Mining information from different sources of data </li></ul></ul><ul><ul><li>Protection of privacy and data security </li></ul></ul>
  8. 8. Introduction to Data Mining <ul><li>Hierarchy of Choices and Decisions </li></ul><ul><ul><li>Business goal </li></ul></ul><ul><ul><li>Collecting, cleaning and preparing data </li></ul></ul><ul><ul><li>Prediction </li></ul></ul><ul><ul><li>Model type and algorithms </li></ul></ul>
  9. 9. Introduction to Data Mining <ul><li>Data Description </li></ul><ul><ul><li>Descriptions of data characteristics in elementary and aggregated form </li></ul></ul><ul><ul><ul><li>Summarization </li></ul></ul></ul><ul><ul><ul><li>Visualization </li></ul></ul></ul>
  10. 10. Introduction to Data Mining <ul><li>Predictive Data Mining </li></ul><ul><ul><li>Predictive modeling is a term used to describe the process of mathematically or mentally representing a phenomenon or occurrence with a series of equations or relationships. </li></ul></ul>
  11. 11. Introduction to Data Mining <ul><li>Prediction: Classification </li></ul><ul><ul><li>Classification predicts class membership </li></ul></ul><ul><ul><ul><li>Pre-classify (using classification algorithms) </li></ul></ul></ul><ul><ul><ul><li>Test to determine the quality of the model </li></ul></ul></ul><ul><ul><ul><li>Predict (using effective classifier) </li></ul></ul></ul>
  12. 12. Introduction to Data Mining <ul><li>Prediction: Regression </li></ul><ul><ul><li>Regression takes a numerical dataset and develops a mathematical formula that fits the data.  </li></ul></ul><ul><ul><li>When you're ready to use the results to predict future behavior, you simply take your new data, plug it into the developed formula and you get a prediction!  </li></ul></ul>
  13. 13. Introduction to Data Mining <ul><li>Algorithms </li></ul><ul><ul><li>Classical Techniques </li></ul></ul><ul><ul><ul><li>Statistics </li></ul></ul></ul><ul><ul><ul><li>Neighborhoods </li></ul></ul></ul><ul><ul><ul><li>Clustering </li></ul></ul></ul><ul><ul><li>Next Generations </li></ul></ul><ul><ul><ul><li>Decision Tree </li></ul></ul></ul><ul><ul><ul><li>Neural Network </li></ul></ul></ul><ul><ul><ul><li>Rule Induction </li></ul></ul></ul>
  14. 14. Introduction to Data Mining <ul><li>Statistics </li></ul><ul><ul><li>Classical Statistics: </li></ul></ul><ul><ul><ul><li>Related to the collection and description of data </li></ul></ul></ul><ul><ul><ul><li>Believes: there exists an underlying pattern of data distribution </li></ul></ul></ul><ul><ul><ul><li>Objective: find the best guess </li></ul></ul></ul><ul><ul><li>Data Mining: </li></ul></ul><ul><ul><ul><li>Employs statistical methods </li></ul></ul></ul><ul><ul><ul><li>Needs to analyze huge amounts of data </li></ul></ul></ul><ul><ul><ul><li>Beyond traditional statistics </li></ul></ul></ul>
  15. 15. Introduction to Data Mining <ul><li>Neighborhoods </li></ul><ul><ul><li>Basic idea: </li></ul></ul><ul><ul><ul><li>For a new problem, look for the similar problems (neighborhoods) that have been solved </li></ul></ul></ul><ul><ul><li>Key point: find the neighborhood </li></ul></ul><ul><ul><ul><li>Calculate the distance: how far is good to be considered as a neighbor? </li></ul></ul></ul><ul><ul><ul><li>Which class the new problem belong to? </li></ul></ul></ul><ul><ul><li>Large computational load: </li></ul></ul><ul><ul><ul><li>New calculation for each new case </li></ul></ul></ul>
  16. 16. Introduction to Data Mining <ul><li>Clustering </li></ul><ul><ul><li>Elements grouped together according to different characteristics </li></ul></ul><ul><ul><ul><li>Every cluster share same values (homogenous) </li></ul></ul></ul><ul><ul><li>Problem: Control the number of cluster </li></ul></ul><ul><ul><ul><li>Hierarchical clustering: flexibility </li></ul></ul></ul><ul><ul><ul><li>Non-hierarchical clustering: given by user </li></ul></ul></ul><ul><ul><li>Used most frequently for: </li></ul></ul><ul><ul><ul><li>Consolidating data into a high-level of view </li></ul></ul></ul><ul><ul><ul><li>Group records into likely behaviors </li></ul></ul></ul>
  17. 17. Introduction to Data Mining <ul><li>Decision Tree </li></ul><ul><ul><li>A way of representing a series of rules that lead to a class or value </li></ul></ul><ul><ul><li>Structure: </li></ul></ul><ul><ul><ul><li>Decision node, branches, leaves </li></ul></ul></ul><ul><ul><li>Example: A loan officer wants to determine the credit of applicants </li></ul></ul>
  18. 18. Introduction to Data Mining <ul><li>Decision Tree (continued) </li></ul><ul><ul><li>Help to induce the tree and its rules to make predictions </li></ul></ul>
  19. 19. Introduction to Data Mining <ul><li>Neural Networks </li></ul><ul><ul><li>Efficiently modeling large and complex problems with hundreds of predictor variables </li></ul></ul><ul><ul><li>Structure: </li></ul></ul><ul><ul><ul><li>Input layer, hidden layer, output layer </li></ul></ul></ul><ul><ul><ul><li>Activation function between nodes </li></ul></ul></ul><ul><ul><ul><li>Requires training and testing of relations </li></ul></ul></ul>
  20. 20. Introduction to Data Mining <ul><li>Neural Networks (continued) </li></ul><ul><ul><li>Example: </li></ul></ul>
  21. 21. Introduction to Data Mining <ul><li>Rule Induction </li></ul><ul><ul><li>A method to derive a set of rules to classify cases </li></ul></ul><ul><ul><ul><li>For example, rule induction can be used to discover patterns relating decisions (e.g., credit card application) </li></ul></ul></ul><ul><ul><li>Rules may not cover all possible situations </li></ul></ul>
  22. 22. Introduction to Data Mining Questions