1.
Data Mining TCSS 555A
Autumn 2007
TCSS 555 A Autumn 2007
Data Mining
Instructor: Isabelle Bichindaritz, Ph.D.
Class: PNK 104 M/W 7:30 – 9:35 P.M.
E-mail: ibichind@u.washington.edu
Office: CP 216
Lab: CP 206M
Office hours: M/W 4:30 P.M. – 7:00 P.M.
by appointment, and always by e-mail
Class Web-site: http://courses.washington.edu/tcss555
OBJECTIVE/DESCRIPTION
TCSS 555 is a Data Mining course. Some of the objectives for this course include:
o Understand the underlying algorithms and methods of data mining.
o Develop data mining programs and applications.
o Program using available data mining tools and general-purpose languages.
o Understand visualization and navigation of data mining results.
o Learn familiarity with machine learning, concept learning, case-based reasoning,
data analysis, cluster analysis, multivariate regression, neural networks, decision
trees, relational methods, and belief networks.
o Learn how to use data mining tools such as SPSS, Clementine, Microsoft
Analysis Services, and Weka.
DESCRIPTION
The data mining course presents methods and systems for mining varied data and
discovering knowledge from data. After detailing a data mining system architecture
and tasks, the course examines and compares specific methods in data mining, such
as concept learning, decision trees, Bayesian and belief networks, neural networks,
case-based reasoning, statistical methods such as cluster analysis and
multidimensional analysis, and text and multimedia mining. Several applications are
detailed, and tools to build new applications are provided. The task of knowledge
discovery is then outlined as a higher-level goal of data mining.
PREREQUISITE
Graduate students: having completed the core
TEXTBOOK
2.
Data Mining TCSS 555A
Autumn 2007
Data Mining Concepts and Techniques, Second Edition, Jiawei Han & Micheline Kamber,
Morgan Kaufman Publishers, ISBN: 1558609016, 2006.
Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, 2nd
edition, Ian H. Witten, Eibe Frank, Morgan Kaufman Publishers, 2005.
RECOMMENDED
o Principles of Data Mining, David Hand, Heikki Mannila, and Padhraic Smyth, MIT
Press, ISBN: 0-262-08290-X, 2001.
o Machine Learning, Tom M. Mitchell, McGraw-Hill Science/Engineering/Math, ISBN:
0070428077, 1997.
o Data Mining and Knowledge Discovery with Evolutionary Algorithms, Alex A. Freitas,
Springer Verlag, ISBN: 3540433317, 2002.
o Information Visualization in Data Mining and Knowledge Discovery, Usama Fayyad,
Georges G. Grinstein, Andreas Wierse, Morgan Kaufman Publishers, ISBN:
1558606890, 2001.
o Advances in Knowledge Discovery and Data Mining, Usama M. Fayyad, Gregory
Piatetsky-Shapior, Padhr Smyth, Ramasamy Uthurusamy, MIT Press, ISBN:
0262560976, 1996.
o The Elements of Statistical Learning: Data Mining, Inference, and Prediction, T. Hastie,
R. Tibshirani, J.H. Friedman, Springer Series in Statistics, ISBN: 0387952845, 2001.
o Introduction to Data Mining, P.-N. Tan, M. Steinbach, and V. Kumar, Addison Wesley,
2005.
CLASS WORK AND EVALUATION
There will be assignments in the form of case studies to solve, a midterm, a final, and a project
leading to a research paper and a presentation in class. Projects can be worked on either in teams
of two students, or alone.
GRADING
Assignments: 30%
Midterm: 15%
Final project: 35%
Final: 15%
Participation: 5%
BONUS
3.
Data Mining TCSS 555A
Autumn 2007
I encourage, and reward, individual efforts to build a community of active learners. Efforts to
participate in class will be awarded bonus points in the class, up to 5%. These efforts, that I will
monitor, are:
Active and constructive participation in the online discussion forum found on the Web-site.
Proposing solutions for exercises in class when I give you a chance.
Submitting answers to online intermediate course evaluations.
CODE OF CONDUCT
The assignments, and of course the quizzes, and exams need to be done individually. Copying
of another student's work, even if changes are subsequently made, is inappropriate, and such
work will not be accepted. The University has very clear guidelines for academic misconduct,
and they will be enforced in this class.
COURSE CHANGES
The schedule and procedures for this course are subject to change. Changes will be announced in
class and it is the student's responsibility to learn and adjust to changes.
IMPORTANT
If you would like to request academic accommodations due to a permanent or temporary
physical, emotional, or mental disability, please contact Lisa Tice, the manager of Disability
Support Services (DSS). An appointment can be made through the front desk of Student Affairs
(692-4400), by phoning Lisa directly at 692-4493 (voice), 692-4413 (TTY), or by e-mail
(dssuwt@u.washington.edu). Appropriate accommodations are arranged after you've presented
the required documentation of your disability to DSS, and you've conferred with the DSS
manager. More information is available on DSS Web site at http://www.tacoma.washington.edu/
oessa/dss/.
4.
Data Mining TCSS 555A
Autumn 2007
TENTATIVE SCHEDULE
Day Date Subject Pre-reading
W 9/26 Introduction to class & data mining 1.1 - 1.4
M 10/1 Main concepts in data mining 1.5 – 1.10
W 10/3 Data preprocessing: data cleaning, integration & transformation 2.1 – 2.4
M 10/8 Data preprocessing: data reduction & discretization 2.5 – 2.6
W 10/10 Introduction to data warehousing 3.1 – 3.2
M 10/15 Rule mining: introduction 5.1 - 5.2
W 10/17 Rule mining: association rules & correlation analysis 5.3 – 5.4
M 10/22 Classification and prediction 6.1 – 6.2
W 10/24 Classification: decision trees 6.3, 6.5
M 10/29 Classification: Bayesian networks & neural networks 6.4, 6.6
W 10/31 Classification: support-vector machines 6.7 – 6.8
M 11/5 Classification: lazy learners & other methods 6.9 – 6.10
W 11/7 Classification: prediction, ensemble methods & accuracy 6.11, 6.13, 6.14
Midterm in PNK 104 7:30pm – 8:30pm
M 11/12 Veterans Day Holiday - NO CLASS
W 11/14 Cluster analysis: introduction 7.1 – 7.3
M 11/19 Cluster analysis: partitioning & hierarchical methods 7.4 – 7.5
W 11/21 Cluster analysis: density-based methods & grid-based methods 7.6 – 7.7
M 11/26 Image mining 10.1 – 10.3
W 11/28 Text mining 10.4 – 10.5
M 12/3 Advances and trends in data mining 11.1, 11.2, 11.4,
11.5
W 12/5 Final exam in PNK 104 7:30pm – 8:30pm
Final presentations
M 12/10 Final presentations in PNK 104: 7:30pm – 8:30pm
Be the first to comment