Data that has relevance for managerial decisions is accumulating at an incredible rate due to a host of technological advances.
Electronic data capture has become inexpensive and ubiquitous as a by-product of innovations such as the internet, e-commerce, electronic banking, point-of-sale devices, bar-code readers, and intelligent machines.
Such data is often stored in data warehouses and data marts specifically intended for management decision support.
Data mining is a rapidly growing field that is concerned with developing techniques to assist managers to make intelligent use of these repositories.
Such as credit rating, fraud detection, database marketing, customer relationship management, and stock market investments.
This course will examine methods that have emerged from both fields and proven to be of value in recognizing patterns and making predictions from an applications perspective. We will survey applications and provide an opportunity for hands-on experimentation with algorithms for data mining using easy-to-use software and cases.
To provide an introduction to knowledge discovery in databases and complex data repositories, and to present basic concepts relevant to real data mining applications, as well as reveal important research issues germane to the knowledge discovery domain and advanced mining applications.
Students will understand the fundamental concepts underlying knowledge discovery in databases and gain hands-on experience with implementation of some data mining algorithms applied to real world cases.
A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty , potentially useful , novel, or validates some hypothesis that a user seeks to confirm
Objective vs. subjective interestingness measures
Objective : based on statistics and structures of patterns , e.g., support, confidence, etc.
Subjective : based on user’s belief in the data, e.g., unexpectedness, novelty, etc.
Data mining is the task of discovering interesting patterns from large amounts of data , where the data can be stored in databases, data warehouses, or other information repositories . It is a young interdisciplinary field , drawing from areas such as database systems, data warehousing, statistics, machine learning, data visualization, information retrieval, and high-performance computing. Other contributing areas include neural networks, pattern recognition, spatial data analysis, image databases, signal processing, and many application fields, such as business, economics, and bioinformatics.
Define each of the following data mining functionalities: association and correlation analysis, classification, prediction, clustering, and evolution analysis. Give example of each data mining functionality, using a real-life database with which you are familiar.
showing attribute-value conditions that occur frequently in a given set of data
finding a set of models that describe and distinguish data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown
analyzing data objects without consulting a known class label
finding data objects that do not comply with the general behavior or model of the data
describes and models regularities or trends for objects whose behavior changes over time
What is the difference between data mining (DM) and pattern recognition (PR)
Both of them are to find useful relations
In PR, we typically deal with data set of moderate size, while in a typical DM application, we are concerned with data sets that are large in terms of dimension and number of clusters
PR is an important techniques used in DM
Data mining involves an integration of techniques from multiple disciplines
Architecture: Typical Data Mining System data cleaning, integration, and selection Database or Data Warehouse Server Data Mining Engine Pattern Evaluation Graphical User Interface Knowledge-Base Database Data Warehouse World-Wide Web Other Info Repositories