Phone: (212) 359 70 27.doc

Uploaded on


  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. MIS 542 Data Warehousing and Data Mining Spring 2006 Instructor: Bertan Badur, Ph.D. Office: HKB 226 Phone: (212) 359 70 27 E-mail: Course Hours: Lectures: Mondays 6,7,8 (14:00-16:50) URL: Course Description: This course consists of three parts In the first part is about basic concepts and methodologies of knowledge discovery from large databases and warehouses. Basic data mining functionalities such as concept description, association, classification, prediction and clustering are introduced. Data warehousing and OLAP is presented. Second part of the course is about detailed discussion of various algorithms to achieve basic data mining functionalities. Applications of these concepts and techniques to real world problems are discussed with the aid of data mining software tools. Third part introduces advanced topics such as : text mining, web mining, mining special or temporal data Motivation: As huge volumes of data accumulates in business, scientific and engineering databases, development of reliable and scalable analysis procedures is essential to extract hidden rules or useful patterns from these large databases. Data mining is an emerging interdisciplinary science aiming at developing automatic or semiautomatic techniques to discover knowledge hidden in these databases, so that decision making processes in business and in other environments are much faster and efficient. Hence, utilization of data mining in finance, marketing, and in telecommunication industries are dramatically increasing in recent years. Text Book: • Data Mining Concepts and Techniques, by Jiawei Han, Kamber M Morgan Kaufmann Publishers 2001 Recommended: • Data Mining : Practical Machine Learning Tools and Techniques 2ed Edition, by Ian H. Witten, Frank E., Morgan Kaufmann Publishers, 2005. • Data Mining Introductory and Advanced Topics, by Margaret H. Dunham, Pearson Education Inc. 2003 • Data Mining: Concepts, Models, Methods, and Algorithms, by Mehmet Kantardzic, IEEE Press Willey Interscience, 2003
  • 2. Supplementary Text Books: Technical books • Data Mining: A Tutorial Based Primer, by Richard J. Roiger, Michael W. Geatz, Addision Wesley 2003 • Machine Learning, by Tom M. Mitchell, McGraw-Hill International Editions, 1997 • Predictive Data Mining : Weiss S. M. and N. Indurkhaya Morgan Koufmann Pub. 1998 • Principles of Data Mining by Hand D., Mannilla H., Smyth P. , MIT Press 2001 • Discovering Knowledge in Data: An Introduction to Data Mining, D. T. Larose, Wiley-Interscience, 2005 . Business Oriented Books • Mastering Data Mining: The Art and Science of Customer Relationship Management, by Michael T. A. Berry, Gordon Linoff, Willey Computer Publishing, 2000 • Data Mining Techniques: For Marketing, Sales and Customer Relationship Management; by Michael T. A. Berry, Gordon Linoff, Willey Computer Publishing, 2004 • Data Mining Cookbook: Modeling Data for Marketing, Risk, and CRM by Rud O. P. John Wiley & Sons Inc. 2001. • The Data Warehouse Lifecycle Toolkit by Kimball R.,Reeves L.,Ross M., Thornthwite W , Wiley 1998 Course Outline: • Introduction (1 Week) • Motivation and Preliminary Definitions • Methodology of Knowledge Discovery in Databases • Architectures of Data Mining Systems • Descriptive/Predictive Data Mining or Supervised and Unsupervised Learning • Data Mining Functionalities • Business Applications • Basic Data Mining Techniques (1 Week) • Decision Trees • ID3 Algorithm • Association Rules • Apriori Algorithm • Clustering • k-Means Algorithm • Methodology of Knowledge Discovery in Databases (1 Week) • KDD Process Model • Data Preprocessing • Handling Missing Data • Data Transformation • Discretization • Sampling • Data Warehouses and OLAP (1 Weeks)
  • 3. • Basic Concepts of Data Warehousing • A Multidimensional Data Model • Architectures of Data Warehousing Systems • Computation of OLAP Cubes • Frequent Pattern Mining (2 Weeks) • Single Dimensional Association Rules • Multilevel Association Rules • Multidimensional Association Rules • Constraint Based Association Mining • Sequential Pattern Mining • Midterm • Classification and Prediction (3 Weeks) • Decision Trees • C4.5 Algorithm • CART • Bayesian Classification • Naïve Bayesian Clasification • Bayesian Belief Networks • Classification by Backpropagation • Bayesian Classification • k-Nearest Neighbor Clasification • Combining Classifiers • Classification Accuracy • Cluster Analysis (2 Weeks) • Types of Data in Cluster Analysis • Partitioning Methods • K-medoids • CLARA • Hierarchical Methods • BIRCH • Density Based Methods • DBSCAN • EM Algorithm • Model Based Methods • Self Organizing Maps • • Case Studies (1 Weeks) Grading: Homework %20 Paper reviews and presentations %10 Project %20 Midterm %25 Final Exam %25 Project:
  • 4. Each student or group of students (at most two) is required to develop a term project. Implementation of selected data mining algorithms, application of studied techniques to a real world problem, or performance study of selected data mining algorithms can be accepted as term projects. Paper Reviews and Presentations: Each student is expected to write a short critical review of a recent paper, related to an application of data mining. A short presentation of the reviewed paper in class is required as well. Homework: There are 5 or 6 sets of homework. These may include discussion questions, numerical problems and data mining problems using real world or artificially generated data Software: • DBMiner: DBMiner 2.0 Educational Version: developed by J. Han and his team ;author of the book “Data Mining Concepts and Techniques”; compatible with the text book, perform association classification and cluster analysis. • SPSS • Neural Connection: Performs neural network modeling for classification and prediction • Answer Tree: Decision tree analysis • Microsoft SQL Server Analysis Services • MATLAB Data Sources: • FoodMart or WareMart Database of Microsoft Analysis Services • Data sources from internet • UCI KDD Archive • UCI Machine Learning Library • Financial/Macroeconomic data from IMKB or TCMB • Text book’s datasets Schedule of Some Events: Project Proposals: 10.04.2006 Paper presentations: - 22.05.2006 Midterm: 03.04.2006 Project Final Report: .After finals Project Presentations: - After finals Late Submition Policy: %20 cut for each late school day