MIS 542 Syllabus 08.doc
Upcoming SlideShare
Loading in...5
×
 

MIS 542 Syllabus 08.doc

on

  • 664 views

 

Statistics

Views

Total Views
664
Views on SlideShare
664
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

MIS 542 Syllabus 08.doc MIS 542 Syllabus 08.doc Document Transcript

  • MIS 542 Data Mining Concepts and Techniques Spring 2008 Instructor: Bertan Badur, Ph.D. Office: HKB 226 Phone: (212) 359 70 27 E-mail: badur@boun.edu.tr Course Hours: Lectures: Fridays 2,3,4 (10:00-12:50) URL: www.mis.boun.edu.tr/badur/MIS542 Course Assistant: Ümit Topaçan Office: HKB 229 Phone: (212) 359 71 13 E-mail: topacan@boun.edu.tr Course Description: This course consists of three parts In the first part is about basic concepts and methodologies of knowledge discovery from large databases and warehouses. Basic data mining functionalities such as concept description, association, classification, prediction and clustering are introduced. Data warehousing and OLAP is presented. Second part of the course is about detailed discussion of various algorithms to achieve basic data mining functionalities. Applications of these concepts and techniques to real world problems are discussed with the aid of data mining software tools. Third part introduces advanced topics such as : text mining, web mining, mining special or temporal data Motivation: As huge volumes of data accumulates in business, scientific and engineering databases, development of reliable and scalable analysis procedures is essential to extract hidden rules or useful patterns from these large databases. Data mining is an emerging interdisciplinary science aiming at developing automatic or semiautomatic techniques to discover knowledge hidden in these databases, so that decision making processes in business and in other environments are much faster and efficient. Hence, utilization of data mining in finance, marketing, and in telecommunication industries are dramatically increasing in recent years. Text Book: • Introduction to Data Mining, by P. N. Tan, M. Stainback, V. Kumar, Pearson Addition Wisley , 5006 Recommended: • Data Mining Concepts and Techniques, 2ed by Jiawei Han, Kamber M Morgan Kaufmann Publishers 2005 • Data Mining : Practical Machine Learning Tools and Techniques 2ed Edition, by Ian H. Witten, Frank E., Morgan Kaufmann Publishers, 2005. • Data Mining Introductory and Advanced Topics, by Margaret H. Dunham, Pearson Education Inc. 2003 • Data Mining: Concepts, Models, Methods, and Algorithms, by Mehmet Kantardzic, IEEE Press Willey Interscience, 2003
  • Supplementary Text Books: Technical books • Data Mining: A Tutorial Based Primer, by Richard J. Roiger, Michael W. Geatz, Addision Wesley 2003 • Machine Learning, by Tom M. Mitchell, McGraw-Hill International Editions, 1997 • Predictive Data Mining : Weiss S. M. and N. Indurkhaya Morgan Koufmann Pub. 1998 • Principles of Data Mining by Hand D., Mannilla H., Smyth P. , MIT Press 2001 • Discovering Knowledge in Data: An Introduction to Data Mining, D. T. Larose, Wiley-Interscience, 2005 . • Handbook of Data Mining and Knowledge Discovery, Willi Klözken, Zytkow J. M., Oxford University Press, 2002. Business Oriented Books • Mastering Data Mining: The Art and Science of Customer Relationship Management, by Michael T. A. Berry, Gordon Linoff, Willey Computer Publishing, 2000 • Data Mining Techniques: For Marketing, Sales and Customer Relationship Management; by Michael T. A. Berry, Gordon Linoff, Willey Computer Publishing, 2004 • Data Mining Cookbook: Modeling Data for Marketing, Risk, and CRM by Rud O. P. John Wiley & Sons Inc. 2001. Course Outline: • Introduction (1 Week) • Motivation and Preliminary Definitions • Methodology of Knowledge Discovery in Databases • Architectures of Data Mining Systems • Descriptive/Predictive Data Mining or Supervised and Unsupervised Learning • Data Mining Functionalities • Business Applications • Basic Data Mining Techniques (1 Week) • Decision Trees • ID3 Algorithm • Association Rules • Apriori Algorithm • Clustering • k-Means Algorithm • Methodology of Knowledge Discovery in Databases (1 Week) • KDD Process Model • Data Preprocessing • Handling Missing Data • Data Transformation • Discretization • Sampling • Data Warehouses and OLAP (1 Weeks) • Basic Concepts of Data Warehousing • A Multidimensional Data Model • Architectures of Data Warehousing Systems • Computation of OLAP Cubes • Cluster Analysis (2 Weeks)
  • • Types of Data in Cluster Analysis • Partitioning Methods • K-medoids • CLARA • Hierarchical Methods • BIRCH • Density Based Methods • DBSCAN • EM Algorithm • Model Based Methods • Self Organizing Maps • Classification and Prediction (3 Weeks) • Decision Trees • C4.5 Algorithm • CART • Bayesian Classification • Naïve Bayesian Clasification • Bayesian Belief Networks • Classification by Backpropagation • Bayesian Classification • k-Nearest Neighbor Clasification • Combining Classifiers • Classification Accuracy • Midterm • Frequent Pattern Mining (2 Weeks) • Single Dimensional Association Rules • Multilevel Association Rules • Multidimensional Association Rules • Constraint Based Association Mining • Sequential Pattern Mining • • Case Studies (1 Weeks) Grading: Homework %20 Paper reviews and presentations % 5 Project %20 Midterm %25 Final Exam %30 Project: Each student or group of students (at most two) is required to develop a term project. Implementation of selected data mining algorithms, application of studied techniques to a real world problem, or performance study of selected data mining algorithms can be accepted as term projects. Paper Reviews and Presentations:
  • Each student is expected to write a short critical review of a recent paper, related to an application of data mining. A short presentation of the reviewed paper in class is required as well. Homework: There are 4 or 5 sets of homework. These may include discussion questions, numerical problems and data mining problems using real world or artificially generated data Software: • DBMiner: DBMiner 2.0 Educational Version: developed by J. Han and his team ;author of the book “Data Mining Concepts and Techniques”; compatible with the text book, perform association classification and cluster analysis. • SPSS • Neural Connection: Performs neural network modeling for classification and prediction • Answer Tree: Decision tree analysis • Microsoft SQL Server Analysis Services • MATLAB Data Sources: • FoodMart or WareMart Database of Microsoft Analysis Services • Data sources from internet • UCI KDD Archive • UCI Machine Learning Library • Financial/Macroeconomic data from IMKB or TCMB • Text book’s datasets Schedule of Some Events: Project Proposals: 04.04.2008 Paper presentations: - 23.05.2007 Midterm: 25.04.2008 Project Final Report: .After finals Project Presentations: - After finals Late Submition Policy: %25 cut for each late school day