Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Data Mining TCSS 555A Autumn 2007 TCSS 555 A Autumn 2007 Data Mining Instructor: Isabelle Bichindaritz, Ph.D. Class: PNK 104 M/W 7:30 – 9:35 P.M. E-mail: Office: CP 216 Lab: CP 206M Office hours: M/W 4:30 P.M. – 7:00 P.M. by appointment, and always by e-mail Class Web-site: OBJECTIVE/DESCRIPTION TCSS 555 is a Data Mining course. Some of the objectives for this course include: o Understand the underlying algorithms and methods of data mining. o Develop data mining programs and applications. o Program using available data mining tools and general-purpose languages. o Understand visualization and navigation of data mining results. o Learn familiarity with machine learning, concept learning, case-based reasoning, data analysis, cluster analysis, multivariate regression, neural networks, decision trees, relational methods, and belief networks. o Learn how to use data mining tools such as SPSS, Clementine, Microsoft Analysis Services, and Weka. DESCRIPTION The data mining course presents methods and systems for mining varied data and discovering knowledge from data. After detailing a data mining system architecture and tasks, the course examines and compares specific methods in data mining, such as concept learning, decision trees, Bayesian and belief networks, neural networks, case-based reasoning, statistical methods such as cluster analysis and multidimensional analysis, and text and multimedia mining. Several applications are detailed, and tools to build new applications are provided. The task of knowledge discovery is then outlined as a higher-level goal of data mining. PREREQUISITE Graduate students: having completed the core TEXTBOOK
  2. 2. Data Mining TCSS 555A Autumn 2007 Data Mining Concepts and Techniques, Second Edition, Jiawei Han & Micheline Kamber, Morgan Kaufman Publishers, ISBN: 1558609016, 2006. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, 2nd edition, Ian H. Witten, Eibe Frank, Morgan Kaufman Publishers, 2005. RECOMMENDED o Principles of Data Mining, David Hand, Heikki Mannila, and Padhraic Smyth, MIT Press, ISBN: 0-262-08290-X, 2001. o Machine Learning, Tom M. Mitchell, McGraw-Hill Science/Engineering/Math, ISBN: 0070428077, 1997. o Data Mining and Knowledge Discovery with Evolutionary Algorithms, Alex A. Freitas, Springer Verlag, ISBN: 3540433317, 2002. o Information Visualization in Data Mining and Knowledge Discovery, Usama Fayyad, Georges G. Grinstein, Andreas Wierse, Morgan Kaufman Publishers, ISBN: 1558606890, 2001. o Advances in Knowledge Discovery and Data Mining, Usama M. Fayyad, Gregory Piatetsky-Shapior, Padhr Smyth, Ramasamy Uthurusamy, MIT Press, ISBN: 0262560976, 1996. o The Elements of Statistical Learning: Data Mining, Inference, and Prediction, T. Hastie, R. Tibshirani, J.H. Friedman, Springer Series in Statistics, ISBN: 0387952845, 2001. o Introduction to Data Mining, P.-N. Tan, M. Steinbach, and V. Kumar, Addison Wesley, 2005. CLASS WORK AND EVALUATION There will be assignments in the form of case studies to solve, a midterm, a final, and a project leading to a research paper and a presentation in class. Projects can be worked on either in teams of two students, or alone. GRADING Assignments: 30% Midterm: 15% Final project: 35% Final: 15% Participation: 5% BONUS
  3. 3. Data Mining TCSS 555A Autumn 2007 I encourage, and reward, individual efforts to build a community of active learners. Efforts to participate in class will be awarded bonus points in the class, up to 5%. These efforts, that I will monitor, are:  Active and constructive participation in the online discussion forum found on the Web-site.  Proposing solutions for exercises in class when I give you a chance.  Submitting answers to online intermediate course evaluations. CODE OF CONDUCT The assignments, and of course the quizzes, and exams need to be done individually. Copying of another student's work, even if changes are subsequently made, is inappropriate, and such work will not be accepted. The University has very clear guidelines for academic misconduct, and they will be enforced in this class. COURSE CHANGES The schedule and procedures for this course are subject to change. Changes will be announced in class and it is the student's responsibility to learn and adjust to changes. IMPORTANT If you would like to request academic accommodations due to a permanent or temporary physical, emotional, or mental disability, please contact Lisa Tice, the manager of Disability Support Services (DSS). An appointment can be made through the front desk of Student Affairs (692-4400), by phoning Lisa directly at 692-4493 (voice), 692-4413 (TTY), or by e-mail ( Appropriate accommodations are arranged after you've presented the required documentation of your disability to DSS, and you've conferred with the DSS manager. More information is available on DSS Web site at oessa/dss/.
  4. 4. Data Mining TCSS 555A Autumn 2007 TENTATIVE SCHEDULE Day Date Subject Pre-reading W 9/26 Introduction to class & data mining 1.1 - 1.4 M 10/1 Main concepts in data mining 1.5 – 1.10 W 10/3 Data preprocessing: data cleaning, integration & transformation 2.1 – 2.4 M 10/8 Data preprocessing: data reduction & discretization 2.5 – 2.6 W 10/10 Introduction to data warehousing 3.1 – 3.2 M 10/15 Rule mining: introduction 5.1 - 5.2 W 10/17 Rule mining: association rules & correlation analysis 5.3 – 5.4 M 10/22 Classification and prediction 6.1 – 6.2 W 10/24 Classification: decision trees 6.3, 6.5 M 10/29 Classification: Bayesian networks & neural networks 6.4, 6.6 W 10/31 Classification: support-vector machines 6.7 – 6.8 M 11/5 Classification: lazy learners & other methods 6.9 – 6.10 W 11/7 Classification: prediction, ensemble methods & accuracy 6.11, 6.13, 6.14 Midterm in PNK 104 7:30pm – 8:30pm M 11/12 Veterans Day Holiday - NO CLASS W 11/14 Cluster analysis: introduction 7.1 – 7.3 M 11/19 Cluster analysis: partitioning & hierarchical methods 7.4 – 7.5 W 11/21 Cluster analysis: density-based methods & grid-based methods 7.6 – 7.7 M 11/26 Image mining 10.1 – 10.3 W 11/28 Text mining 10.4 – 10.5 M 12/3 Advances and trends in data mining 11.1, 11.2, 11.4, 11.5 W 12/5 Final exam in PNK 104 7:30pm – 8:30pm Final presentations M 12/10 Final presentations in PNK 104: 7:30pm – 8:30pm