Different from OLAP -- aimed at discovery of information, without a previously formulated hypothesis Previously unknown -- discovery of patterns that are not intuitive, even counteintuitive. The more away from obvious, the greater the value, eg. Classic example of beer and diapers -- retail store analyzing customer buying patterns discovered strong association between sales of diapers and beer, specially on Friday evenings -- male shoppers. Valid -- if you look long enough at a large collection, bound to find something of interest sooner or later. -- possibility of spurious relations high. Need for post mining validation and sanity checking. Actionable -- must be abler to translate to some business advantage. Eg. Place beer and diapers close, do not discount both together, etc. Prediction: from past cases with known answers, project to new cases. Eg. Fraud detection, healthcare outcomes analysis (treatments are considered cost effective if they fit patterns in previously successful patients), target marketing, investments. Knowledge discovery -- undirected. Stage prior to prediction.
Knowledge Discovery & Data Mining
process of extracting previously unknown , valid , and actionable (understandable) information from large databases
Data mining is a step in the KDD process of applying data analysis and discovery algorithms
Machine learning, pattern recognition, statistics, databases, data visualization.