visualization, transformation, removing redundant patterns, etc.
Use of discovered knowledge
Data Mining and Business Intelligence Increasing potential to support business decisions End User Business Analyst Data Analyst DBA Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration OLAP, MDA Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts Data Sources Paper, Files, Information Providers, Database Systems, OLTP
Interestingness measures : A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful , novel, or validates some hypothesis that a user seeks to confirm
Objective vs. subjective interestingness measures:
Objective: based on statistics and structures of patterns, e.g., support, confidence, etc.
Subjective: based on user’s belief in the data, e.g., unexpectedness, novelty, actionability, etc.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Technology Statistics Other Disciplines Information Science Machine Learning Visualization
Introduction Data Exploratory Data Analysis Classification Similarity Assessment Clustering OLAP and Data Warehousing Association Analysis Preprocessing More on Clustering Sequence and Graph Mining Spatial Data Mining More on Classification Mining Data Streams Summary
Also: Software Design for Data Mining ( (likely second half of September)
2009 Assignments Assignment1: Getting Familiar with Cougar^2 (please attend the lab classes on September 10 and 17) Assignment2: Exploratory Data Analysis Assignment3: Making Sense of Data using Traditional and Clustering with Plug-in Fitness Functions Assigment4: Review for Midterm Exam (contains paper and pencil questions covering classfication, clustering, and association analysis) Assignment 5: TBDL (will require programming) Assignment 6: Preparation for the Final Exam (contains paper and pencil question covering preprocessing, outlier detection, advanced clustering and classification, and sequence mining)
The first 8 weeks will give a basic introduction to data mining and follows the textbook somewhat closely.
Read the sections of the textbook before you come to the lecture; if you work continuously for the class you will do better and lectures will be more enjoyable. Starting to review the material that is covered in this class 1 week before the next exam is not a good idea.
Do not be afraid to ask questions! I really like interactions with students in the lectures… If you do not understand something at all send me an e-mail before the next lecture!
If you have a serious problem talk to me, before the problem gets out of hand.
I also suggest to taking at least 1, preferably two, of the following courses: Pattern Classification (COSC 6343), Artificial Intelligence (COSC 6368), and Machine Learning (COSC 6342).
Moreover, having basic knowledge in data structures, software design, and databases is important when conducting data mining projects; therefore, taking COSC 6320, COSC 6318 and COSC 6340 is a good choice.
Moreover, taking a course that teaches high performance computing is also a good choice, because data mining algorithms are very time consuming.
Because a lot of data mining projects have to deal with images, I suggest to take at least one of the many biomedical image processing courses that are offered in our curriculum.
Finally, having knowledge in evolutionary computing, software engineering, data visualization, statistics, solving optimization problems, GIS (geographical information systems) is a plus!