SlideShare a Scribd company logo
1 of 4
MIS 542 Data Warehousing and Data Mining
                              Spring 2006

Instructor:     Bertan Badur, Ph.D.
Office:         HKB 226
Phone:          (212) 359 70 27
E-mail:         badur@boun.edu.tr

Course Hours: Lectures: Mondays 6,7,8 (14:00-16:50)
URL:          www.mis.boun.edu.tr/badur/MIS542

Course Description:

This course consists of three parts In the first part is about basic concepts and
methodologies of knowledge discovery from large databases and warehouses. Basic data
mining functionalities such as concept description, association, classification, prediction
and clustering are introduced. Data warehousing and OLAP is presented. Second part of
the course is about detailed discussion of various algorithms to achieve basic data
mining functionalities. Applications of these concepts and techniques to real world
problems are discussed with the aid of data mining software tools. Third part introduces
advanced topics such as : text mining, web mining, mining special or temporal data

Motivation:

As huge volumes of data accumulates in business, scientific and engineering databases,
development of reliable and scalable analysis procedures is essential to extract hidden
rules or useful patterns from these large databases. Data mining is an emerging
interdisciplinary science aiming at developing automatic or semiautomatic techniques to
discover knowledge hidden in these databases, so that decision making processes in
business and in other environments are much faster and efficient. Hence, utilization of
data mining in finance, marketing, and in telecommunication industries are dramatically
increasing in recent years.

Text Book:

•   Data Mining Concepts and Techniques, by Jiawei Han, Kamber M Morgan Kaufmann
    Publishers 2001

Recommended:

•   Data Mining : Practical Machine Learning Tools and Techniques 2ed Edition, by Ian
    H. Witten, Frank E., Morgan Kaufmann Publishers, 2005.
•   Data Mining Introductory and Advanced Topics, by Margaret H. Dunham, Pearson
    Education Inc. 2003
•   Data Mining: Concepts, Models, Methods, and Algorithms, by Mehmet Kantardzic,
    IEEE Press Willey Interscience, 2003
Supplementary Text Books:
   Technical books
   • Data Mining: A Tutorial Based Primer, by Richard J. Roiger, Michael W. Geatz,
      Addision Wesley 2003
   • Machine Learning, by Tom M. Mitchell, McGraw-Hill International Editions,
      1997
   • Predictive Data Mining : Weiss S. M. and N. Indurkhaya Morgan Koufmann Pub.
      1998
   • Principles of Data Mining by Hand D., Mannilla H., Smyth P. , MIT Press 2001
   • Discovering Knowledge in Data: An Introduction to Data Mining, D. T. Larose,
      Wiley-Interscience, 2005 .
   Business Oriented Books
   • Mastering Data Mining: The Art and Science of Customer Relationship
      Management, by Michael T. A. Berry, Gordon Linoff, Willey Computer
      Publishing, 2000
   • Data Mining Techniques: For Marketing, Sales and Customer Relationship
      Management; by Michael T. A. Berry, Gordon Linoff, Willey Computer
      Publishing, 2004
   • Data Mining Cookbook: Modeling Data for Marketing, Risk, and CRM by Rud O.
      P. John Wiley & Sons Inc. 2001.
   • The Data Warehouse Lifecycle Toolkit by Kimball R.,Reeves L.,Ross M.,
      Thornthwite W , Wiley 1998

Course Outline:

•   Introduction (1 Week)
    • Motivation and Preliminary Definitions
    • Methodology of Knowledge Discovery in Databases
    • Architectures of Data Mining Systems
    • Descriptive/Predictive Data Mining or Supervised and Unsupervised Learning
    • Data Mining Functionalities
    • Business Applications
•   Basic Data Mining Techniques (1 Week)
    • Decision Trees
        • ID3 Algorithm
    • Association Rules
        • Apriori Algorithm
    • Clustering
        • k-Means Algorithm
•   Methodology of Knowledge Discovery in Databases (1 Week)
    • KDD Process Model
    • Data Preprocessing
    • Handling Missing Data
    • Data Transformation
    • Discretization
    • Sampling
•   Data Warehouses and OLAP (1 Weeks)
•  Basic Concepts of Data Warehousing
    • A Multidimensional Data Model
    • Architectures of Data Warehousing Systems
    • Computation of OLAP Cubes
•   Frequent Pattern Mining (2 Weeks)
    • Single Dimensional Association Rules
    • Multilevel Association Rules
    • Multidimensional Association Rules
    • Constraint Based Association Mining
    • Sequential Pattern Mining
•   Midterm
•   Classification and Prediction (3 Weeks)
    • Decision Trees
       • C4.5 Algorithm
       • CART
    • Bayesian Classification
       • Naïve Bayesian Clasification
       • Bayesian Belief Networks
    • Classification by Backpropagation
    • Bayesian Classification
    • k-Nearest Neighbor Clasification
    • Combining Classifiers
    • Classification Accuracy
•   Cluster Analysis (2 Weeks)
    • Types of Data in Cluster Analysis
    • Partitioning Methods
       • K-medoids
       • CLARA
    • Hierarchical Methods
       • BIRCH
    • Density Based Methods
       • DBSCAN
       • EM Algorithm
    • Model Based Methods
       • Self Organizing Maps
    •
•   Case Studies (1 Weeks)

Grading:

Homework                          %20
Paper reviews and presentations   %10
Project                           %20
Midterm                           %25
Final Exam                        %25

Project:
Each student or group of students (at most two) is required to develop a term project.
Implementation of selected data mining algorithms, application of studied techniques to a
real world problem, or performance study of selected data mining algorithms can be
accepted as term projects.

Paper Reviews and Presentations:

Each student is expected to write a short critical review of a recent paper, related to an
application of data mining. A short presentation of the reviewed paper in class is
required as well.

Homework:

There are 5 or 6 sets of homework. These may include discussion questions, numerical
problems and data mining problems using real world or artificially generated data

Software:

•   DBMiner: DBMiner 2.0 Educational Version: developed by J. Han and his team
    ;author of the book “Data Mining Concepts and Techniques”; compatible with the text
    book, perform association classification and cluster analysis.
•   SPSS
    • Neural Connection: Performs neural network modeling for classification and
        prediction
    • Answer Tree: Decision tree analysis
•   Microsoft SQL Server Analysis Services
•   MATLAB

Data Sources:

•   FoodMart or WareMart Database of Microsoft Analysis Services
•   Data sources from internet
    • UCI KDD Archive
    • UCI Machine Learning Library
•   Financial/Macroeconomic data from IMKB or TCMB
•   Text book’s datasets

Schedule of Some Events:

Project Proposals: 10.04.2006
Paper presentations: - 22.05.2006
Midterm: 03.04.2006
Project Final Report: .After finals
Project Presentations: - After finals
Late Submition Policy:
%20 cut for each late school day

More Related Content

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Phone: (212) 359 70 27.doc

  • 1. MIS 542 Data Warehousing and Data Mining Spring 2006 Instructor: Bertan Badur, Ph.D. Office: HKB 226 Phone: (212) 359 70 27 E-mail: badur@boun.edu.tr Course Hours: Lectures: Mondays 6,7,8 (14:00-16:50) URL: www.mis.boun.edu.tr/badur/MIS542 Course Description: This course consists of three parts In the first part is about basic concepts and methodologies of knowledge discovery from large databases and warehouses. Basic data mining functionalities such as concept description, association, classification, prediction and clustering are introduced. Data warehousing and OLAP is presented. Second part of the course is about detailed discussion of various algorithms to achieve basic data mining functionalities. Applications of these concepts and techniques to real world problems are discussed with the aid of data mining software tools. Third part introduces advanced topics such as : text mining, web mining, mining special or temporal data Motivation: As huge volumes of data accumulates in business, scientific and engineering databases, development of reliable and scalable analysis procedures is essential to extract hidden rules or useful patterns from these large databases. Data mining is an emerging interdisciplinary science aiming at developing automatic or semiautomatic techniques to discover knowledge hidden in these databases, so that decision making processes in business and in other environments are much faster and efficient. Hence, utilization of data mining in finance, marketing, and in telecommunication industries are dramatically increasing in recent years. Text Book: • Data Mining Concepts and Techniques, by Jiawei Han, Kamber M Morgan Kaufmann Publishers 2001 Recommended: • Data Mining : Practical Machine Learning Tools and Techniques 2ed Edition, by Ian H. Witten, Frank E., Morgan Kaufmann Publishers, 2005. • Data Mining Introductory and Advanced Topics, by Margaret H. Dunham, Pearson Education Inc. 2003 • Data Mining: Concepts, Models, Methods, and Algorithms, by Mehmet Kantardzic, IEEE Press Willey Interscience, 2003
  • 2. Supplementary Text Books: Technical books • Data Mining: A Tutorial Based Primer, by Richard J. Roiger, Michael W. Geatz, Addision Wesley 2003 • Machine Learning, by Tom M. Mitchell, McGraw-Hill International Editions, 1997 • Predictive Data Mining : Weiss S. M. and N. Indurkhaya Morgan Koufmann Pub. 1998 • Principles of Data Mining by Hand D., Mannilla H., Smyth P. , MIT Press 2001 • Discovering Knowledge in Data: An Introduction to Data Mining, D. T. Larose, Wiley-Interscience, 2005 . Business Oriented Books • Mastering Data Mining: The Art and Science of Customer Relationship Management, by Michael T. A. Berry, Gordon Linoff, Willey Computer Publishing, 2000 • Data Mining Techniques: For Marketing, Sales and Customer Relationship Management; by Michael T. A. Berry, Gordon Linoff, Willey Computer Publishing, 2004 • Data Mining Cookbook: Modeling Data for Marketing, Risk, and CRM by Rud O. P. John Wiley & Sons Inc. 2001. • The Data Warehouse Lifecycle Toolkit by Kimball R.,Reeves L.,Ross M., Thornthwite W , Wiley 1998 Course Outline: • Introduction (1 Week) • Motivation and Preliminary Definitions • Methodology of Knowledge Discovery in Databases • Architectures of Data Mining Systems • Descriptive/Predictive Data Mining or Supervised and Unsupervised Learning • Data Mining Functionalities • Business Applications • Basic Data Mining Techniques (1 Week) • Decision Trees • ID3 Algorithm • Association Rules • Apriori Algorithm • Clustering • k-Means Algorithm • Methodology of Knowledge Discovery in Databases (1 Week) • KDD Process Model • Data Preprocessing • Handling Missing Data • Data Transformation • Discretization • Sampling • Data Warehouses and OLAP (1 Weeks)
  • 3. • Basic Concepts of Data Warehousing • A Multidimensional Data Model • Architectures of Data Warehousing Systems • Computation of OLAP Cubes • Frequent Pattern Mining (2 Weeks) • Single Dimensional Association Rules • Multilevel Association Rules • Multidimensional Association Rules • Constraint Based Association Mining • Sequential Pattern Mining • Midterm • Classification and Prediction (3 Weeks) • Decision Trees • C4.5 Algorithm • CART • Bayesian Classification • Naïve Bayesian Clasification • Bayesian Belief Networks • Classification by Backpropagation • Bayesian Classification • k-Nearest Neighbor Clasification • Combining Classifiers • Classification Accuracy • Cluster Analysis (2 Weeks) • Types of Data in Cluster Analysis • Partitioning Methods • K-medoids • CLARA • Hierarchical Methods • BIRCH • Density Based Methods • DBSCAN • EM Algorithm • Model Based Methods • Self Organizing Maps • • Case Studies (1 Weeks) Grading: Homework %20 Paper reviews and presentations %10 Project %20 Midterm %25 Final Exam %25 Project:
  • 4. Each student or group of students (at most two) is required to develop a term project. Implementation of selected data mining algorithms, application of studied techniques to a real world problem, or performance study of selected data mining algorithms can be accepted as term projects. Paper Reviews and Presentations: Each student is expected to write a short critical review of a recent paper, related to an application of data mining. A short presentation of the reviewed paper in class is required as well. Homework: There are 5 or 6 sets of homework. These may include discussion questions, numerical problems and data mining problems using real world or artificially generated data Software: • DBMiner: DBMiner 2.0 Educational Version: developed by J. Han and his team ;author of the book “Data Mining Concepts and Techniques”; compatible with the text book, perform association classification and cluster analysis. • SPSS • Neural Connection: Performs neural network modeling for classification and prediction • Answer Tree: Decision tree analysis • Microsoft SQL Server Analysis Services • MATLAB Data Sources: • FoodMart or WareMart Database of Microsoft Analysis Services • Data sources from internet • UCI KDD Archive • UCI Machine Learning Library • Financial/Macroeconomic data from IMKB or TCMB • Text book’s datasets Schedule of Some Events: Project Proposals: 10.04.2006 Paper presentations: - 22.05.2006 Midterm: 03.04.2006 Project Final Report: .After finals Project Presentations: - After finals Late Submition Policy: %20 cut for each late school day