ICE0403: Data & Text Mining
Instructor Dr. Ji-Ae Shin
Email firstname.lastname@example.org ( The best to reach me is via emails)
Office F612 (x 6172)
Objective of the Course
To study machine learning techniques which are commonly used to extract
information/patterns in (textual) data, such as Inductive Logic Programming, Clustering,
Statistical Logic Networks, and Decision Trees.
Algorithms (A-); Artificial Intelligence (A-); Discrete Mathematics I;
Probability & Statistics; Scripting Languages
1. (Textbook) I. Witten & E. Frank, Data Mining (2nd ed.), Elsevier, 2005
2. (Textbook) T. Mitchell, Machine Learning, McGraw-Hill, 1997
3. D. J. Hand, H. Mannila and P. Smyth. Principles of Data Mining (Adaptive
Computation and Machine Learning), MIT Press, 2001
4. T. Hastie, R. Tibshirani, & J. H. Friedman, The Elements of Statistical
Learning : Data Mining, Inference, and Prediction Springer Verlag, 2001.
5. R. O. Duda, P. E. Hart, & D. G. Stork, Pattern Classification Wiley-
6. P. Langley, Elements of Machine Learning, Morgan Kaufman Publishers, San
Fancisco, CA, 1995.
7. S. M. Weiss & C. A. Kulikowski, Computer Systems that Learn, Morgan
Kaufman Publishers, San Fancisco, CA, 1991.
8. W. Shavlik & T. G. Dietterich (Eds.), Readings in Machine Learning, Morgan
Kaufman Publishers, San Fancisco, CA, 1990.
Homework (3: Theory & Practice) 36%
Term Project 24%
Class Participation 10%
Final Exam 30%
Topics in Consideration
Definition of learning systems. Goals and applications of machine learning.
Aspects of developing a learning system: training data, concept representation,
2. Inductive Classification
The concept learning task. Concept learning as search through a hypothesis
space. General-to-specific ordering of hypotheses. Finding maximally specific
hypotheses. Version spaces and the candidate elimination algorithm. Learning
conjunctive concepts. The importance of inductive bias.
3. Decision Tree Learning
Representing concepts as decision trees. Recursive induction of decision trees.
Picking the best splitting attribute: entropy and information gain. Searching for
simple trees and computational complexity. Occam's razor. Overfitting, noisy
data, and pruning.
4. Experimental Evaluation of Learning Algorithms
Measuring the accuracy of learned hypotheses. Comparing learning algorithms:
cross-validation, learning curves, and statistical hypothesis testing.
5. Rule Learning: Propositional and First-Order
Translating decision trees into rules. Heuristic rule induction using separate and
conquer and information gain. First-order Horn-clause induction (Inductive
Logic Programming) and Foil. Learning recursive rules. Inverse resolution,
Golem, and Progol.
6. Artificial Neural Networks (if time permits)
Neurons and biological motivation. Linear threshold units. Perceptrons:
representational limitation and gradient descent training. Multilayer networks
and backpropagation. Hidden layers and constructing intermediate, distributed
representations. Overfitting, learning network structure, recurrent networks.
7. Support Vector Machines
Maximum margin linear separators. Quadractic programming solution to finding
maximum margin separators. Kernels for learning non-linear functions.
8. Bayesian Learning
Probability theory and Bayes rule. Naive Bayes learning algorithm. Parameter
smoothing. Bayes nets and Markov nets for representing dependencies.
9. Instance-Based Learning
Constructing explicit generalizations versus comparing to past specific
examples. k-Nearest-neighbor algorithm. Case-based learning.
10. Text Classification (if time permits)
Bag of words representation. Vector space model and cosine similarity.
Relevance feedback and Rocchio algorithm. Versions of nearest neighbor and
Naive Bayes for text.
11. Clustering and Unsupervised Learning
Learning from unclassified data. Clustering. Hierarchical Aglomerative
Clustering. k-means partitional clustering. Expectation maximization (EM) for
soft clustering. Semi-supervised learning with EM using labeled and unlabled
12. Language Learning (if time permits)
Classification problems in language: word-sense disambiguation, sequence
labeling. Hidden Markov models (HMM's). Veterbi algorithm for determining
most-probable state sequences. Forward-backward EM algorithm for training the
parameters of HMM's. Use of HMM's for speech recognition, part-of-speech
tagging, and information extraction. Conditional random fields (CRF's).
Probabilistic context-free grammars (PCFG). Parsing and learning with PCFGs.
13. Using Prior Knowledge in Learning (if time permits)
Chapter 11, Chapter 12. Explanation-based learning. Learning in planning and
problem-solving. Knowledge-based learning and theory refinement. Transfer