SlideShare a Scribd company logo
Data Mining
Practical Machine Learning Tools and Techniques
The Morgan Kaufmann Series in Data Management Systems
Series Editor: Jim Gray, Microsoft Research
Data Mining: Practical Machine Learning
Tools and Techniques, Second Edition
Ian H. Witten and Eibe Frank
Fuzzy Modeling and Genetic Algorithms for
Data Mining and Exploration
Earl Cox
Data Modeling Essentials, Third Edition
Graeme C. Simsion and Graham C. Witt
Location-Based Services
Jochen Schiller and Agnès Voisard
Database Modeling with Microsoft® Visio for
Enterprise Architects
Terry Halpin, Ken Evans, Patrick Hallock,
and Bill Maclean
Designing Data-Intensive Web Applications
Stefano Ceri, Piero Fraternali, Aldo Bongio,
Marco Brambilla, Sara Comai, and
Maristella Matera
Mining the Web: Discovering Knowledge
from Hypertext Data
Soumen Chakrabarti
Advanced SQL: 1999—Understanding
Object-Relational and Other Advanced
Features
Jim Melton
Database Tuning: Principles, Experiments,
and Troubleshooting Techniques
Dennis Shasha and Philippe Bonnet
SQL: 1999—Understanding Relational
Language Components
Jim Melton and Alan R. Simon
Information Visualization in Data Mining
and Knowledge Discovery
Edited by Usama Fayyad, Georges G.
Grinstein, and Andreas Wierse
Transactional Information Systems: Theory,
Algorithms, and the Practice of Concurrency
Control and Recovery
Gerhard Weikum and Gottfried Vossen
Spatial Databases: With Application to GIS
Philippe Rigaux, Michel Scholl, and Agnès
Voisard
Information Modeling and Relational
Databases: From Conceptual Analysis to
Logical Design
Terry Halpin
Component Database Systems
Edited by Klaus R. Dittrich and Andreas
Geppert
Managing Reference Data in Enterprise
Databases: Binding Corporate Data to the
Wider World
Malcolm Chisholm
Data Mining: Concepts and Techniques
Jiawei Han and Micheline Kamber
Understanding SQL and Java Together: A
Guide to SQLJ, JDBC, and Related
Technologies
Jim Melton and Andrew Eisenberg
Database: Principles, Programming, and
Performance, Second Edition
Patrick O’Neil and Elizabeth O’Neil
The Object Data Standard: ODMG 3.0
Edited by R. G. G. Cattell, Douglas K.
Barry, Mark Berler, Jeff Eastman, David
Jordan, Craig Russell, Olaf Schadow,
Torsten Stanienda, and Fernando Velez
Data on the Web: From Relations to
Semistructured Data and XML
Serge Abiteboul, Peter Buneman, and Dan
Suciu
Data Mining: Practical Machine Learning
Tools and Techniques with Java
Implementations
Ian H. Witten and Eibe Frank
Joe Celko’s SQL for Smarties: Advanced SQL
Programming, Second Edition
Joe Celko
Joe Celko’s Data and Databases: Concepts in
Practice
Joe Celko
Developing Time-Oriented Database
Applications in SQL
Richard T. Snodgrass
Web Farming for the Data Warehouse
Richard D. Hackathorn
Database Modeling & Design, Third Edition
Toby J. Teorey
Management of Heterogeneous and
Autonomous Database Systems
Edited by Ahmed Elmagarmid, Marek
Rusinkiewicz, and Amit Sheth
Object-Relational DBMSs: Tracking the Next
Great Wave, Second Edition
Michael Stonebraker and Paul Brown, with
Dorothy Moore
A Complete Guide to DB2 Universal
Database
Don Chamberlin
Universal Database Management: A Guide
to Object/Relational Technology
Cynthia Maro Saracco
Readings in Database Systems, Third Edition
Edited by Michael Stonebraker and Joseph
M. Hellerstein
Understanding SQL’s Stored Procedures: A
Complete Guide to SQL/PSM
Jim Melton
Principles of Multimedia Database Systems
V. S. Subrahmanian
Principles of Database Query Processing for
Advanced Applications
Clement T. Yu and Weiyi Meng
Advanced Database Systems
Carlo Zaniolo, Stefano Ceri, Christos
Faloutsos, Richard T. Snodgrass, V. S.
Subrahmanian, and Roberto Zicari
Principles of Transaction Processing for the
Systems Professional
Philip A. Bernstein and Eric Newcomer
Using the New DB2: IBM’s Object-Relational
Database System
Don Chamberlin
Distributed Algorithms
Nancy A. Lynch
Active Database Systems: Triggers and Rules
For Advanced Database Processing
Edited by Jennifer Widom and Stefano Ceri
Migrating Legacy Systems: Gateways,
Interfaces & the Incremental Approach
Michael L. Brodie and Michael Stonebraker
Atomic Transactions
Nancy Lynch, Michael Merritt, William
Weihl, and Alan Fekete
Query Processing For Advanced Database
Systems
Edited by Johann Christoph Freytag, David
Maier, and Gottfried Vossen
Transaction Processing: Concepts and
Techniques
Jim Gray and Andreas Reuter
Building an Object-Oriented Database
System: The Story of O2
Edited by François Bancilhon, Claude
Delobel, and Paris Kanellakis
Database Transaction Models For Advanced
Applications
Edited by Ahmed K. Elmagarmid
A Guide to Developing Client/Server SQL
Applications
Setrag Khoshafian, Arvola Chan, Anna
Wong, and Harry K. T. Wong
The Benchmark Handbook For Database
and Transaction Processing Systems, Second
Edition
Edited by Jim Gray
Camelot and Avalon: A Distributed
Transaction Facility
Edited by Jeffrey L. Eppinger, Lily B.
Mummert, and Alfred Z. Spector
Readings in Object-Oriented Database
Systems
Edited by Stanley B. Zdonik and David
Maier
Data Mining
Practical Machine Learning Tools and Techniques,
Second Edition
Ian H. Witten
Department of Computer Science
University of Waikato
Eibe Frank
Department of Computer Science
University of Waikato
AMSTERDAM • BOSTON • HEIDELBERG • LONDON
NEW YORK • OXFORD • PARIS • SAN DIEGO
SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
MORGAN KAUFMANN PUBLISHERS IS AN IMPRINT OF ELSEVIER
Publisher: Diane Cerra
Publishing Services Manager: Simon Crump
Project Manager: Brandy Lilly
Editorial Assistant: Asma Stephan
Cover Design: Yvo Riezebos Design
Cover Image: Getty Images
Composition: SNP Best-set Typesetter Ltd., Hong Kong
Technical Illustration: Dartmouth Publishing, Inc.
Copyeditor: Graphic World Inc.
Proofreader: Graphic World Inc.
Indexer: Graphic World Inc.
Interior printer: The Maple-Vail Book Manufacturing Group
Cover printer: Phoenix Color Corp
Morgan Kaufmann Publishers is an imprint of Elsevier.
500 Sansome Street, Suite 400, San Francisco, CA 94111
This book is printed on acid-free paper.
© 2005 by Elsevier Inc. All rights reserved.
Designations used by companies to distinguish their products are often claimed as trademarks
or registered trademarks. In all instances in which Morgan Kaufmann Publishers is aware of a
claim, the product names appear in initial capital or all capital letters. Readers, however, should
contact the appropriate companies for more complete information regarding trademarks and
registration.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopying, scanning, or otherwise—
without prior written permission of the publisher.
Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in
Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail:
permissions@elsevier.com.uk. You may also complete your request on-line via the Elsevier
homepage (http://elsevier.com) by selecting “Customer Support” and then “Obtaining
Permissions.”
Library of Congress Cataloging-in-Publication Data
Witten, I. H. (Ian H.)
Data mining : practical machine learning tools and techniques / Ian H. Witten, Eibe
Frank. – 2nd ed.
p. cm. – (Morgan Kaufmann series in data management systems)
Includes bibliographical references and index.
ISBN: 0-12-088407-0
1. Data mining. I. Frank, Eibe. II. Title. III. Series.
QA76.9.D343W58 2005
006.3–dc22 2005043385
For information on all Morgan Kaufmann publications,
visit our Web site at www.mkp.com or www.books.elsevier.com
Printed in the United States of America
05 06 07 08 09 5 4 3 2 1
Working together to grow
libraries in developing countries
www.elsevier.com | www.bookaid.org | www.sabre.org
Foreword
Jim Gray, Series Editor
Microsoft Research
Technology now allows us to capture and store vast quantities of data. Finding
patterns, trends, and anomalies in these datasets, and summarizing them
with simple quantitative models, is one of the grand challenges of the infor-
mation age—turning data into information and turning information into
knowledge.
There has been stunning progress in data mining and machine learning. The
synthesis of statistics, machine learning, information theory, and computing has
created a solid science, with a firm mathematical base, and with very powerful
tools. Witten and Frank present much of this progress in this book and in the
companion implementation of the key algorithms. As such, this is a milestone
in the synthesis of data mining, data analysis, information theory, and machine
learning. If you have not been following this field for the last decade, this is a
great way to catch up on this exciting progress. If you have, then Witten and
Frank’s presentation and the companion open-source workbench, called Weka,
will be a useful addition to your toolkit.
They present the basic theory of automatically extracting models from data,
and then validating those models. The book does an excellent job of explaining
the various models (decision trees, association rules, linear models, clustering,
Bayes nets, neural nets) and how to apply them in practice. With this basis, they
then walk through the steps and pitfalls of various approaches. They describe
how to safely scrub datasets, how to build models, and how to evaluate a model’s
predictive quality. Most of the book is tutorial, but Part II broadly describes how
commercial systems work and gives a tour of the publicly available data mining
workbench that the authors provide through a website. This Weka workbench
has a graphical user interface that leads you through data mining tasks and has
excellent data visualization tools that help understand the models. It is a great
companion to the text and a useful and popular tool in its own right.
v
This book presents this new discipline in a very accessible form: as a text
both to train the next generation of practitioners and researchers and to inform
lifelong learners like myself. Witten and Frank have a passion for simple and
elegant solutions. They approach each topic with this mindset, grounding all
concepts in concrete examples, and urging the reader to consider the simple
techniques first, and then progress to the more sophisticated ones if the simple
ones prove inadequate.
If you are interested in databases, and have not been following the machine
learning field, this book is a great way to catch up on this exciting progress. If
you have data that you want to analyze and understand, this book and the asso-
ciated Weka toolkit are an excellent way to start.
vi FOREWORD
Contents
Foreword v
Preface xxiii
Updated and revised content xxvii
Acknowledgments xxix
Part I Machine learning tools and techniques 1
1 What’s it all about? 3
1.1 Data mining and machine learning 4
Describing structural patterns 6
Machine learning 7
Data mining 9
1.2 Simple examples: The weather problem and others 9
The weather problem 10
Contact lenses: An idealized problem 13
Irises: A classic numeric dataset 15
CPU performance: Introducing numeric prediction 16
Labor negotiations: A more realistic example 17
Soybean classification: A classic machine learning success 18
1.3 Fielded applications 22
Decisions involving judgment 22
Screening images 23
Load forecasting 24
Diagnosis 25
Marketing and sales 26
Other applications 28
vii
1.4 Machine learning and statistics 29
1.5 Generalization as search 30
Enumerating the concept space 31
Bias 32
1.6 Data mining and ethics 35
1.7 Further reading 37
2 Input: Concepts, instances, and attributes 41
2.1 What’s a concept? 42
2.2 What’s in an example? 45
2.3 What’s in an attribute? 49
2.4 Preparing the input 52
Gathering the data together 52
ARFF format 53
Sparse data 55
Attribute types 56
Missing values 58
Inaccurate values 59
Getting to know your data 60
2.5 Further reading 60
3 Output: Knowledge representation 61
3.1 Decision tables 62
3.2 Decision trees 62
3.3 Classification rules 65
3.4 Association rules 69
3.5 Rules with exceptions 70
3.6 Rules involving relations 73
3.7 Trees for numeric prediction 76
3.8 Instance-based representation 76
3.9 Clusters 81
3.10 Further reading 82
viii CONTENTS
4 Algorithms: The basic methods 83
4.1 Inferring rudimentary rules 84
Missing values and numeric attributes 86
Discussion 88
4.2 Statistical modeling 88
Missing values and numeric attributes 92
Bayesian models for document classification 94
Discussion 96
4.3 Divide-and-conquer: Constructing decision trees 97
Calculating information 100
Highly branching attributes 102
Discussion 105
4.4 Covering algorithms: Constructing rules 105
Rules versus trees 107
A simple covering algorithm 107
Rules versus decision lists 111
4.5 Mining association rules 112
Item sets 113
Association rules 113
Generating rules efficiently 117
Discussion 118
4.6 Linear models 119
Numeric prediction: Linear regression 119
Linear classification: Logistic regression 121
Linear classification using the perceptron 124
Linear classification using Winnow 126
4.7 Instance-based learning 128
The distance function 128
Finding nearest neighbors efficiently 129
Discussion 135
4.8 Clustering 136
Iterative distance-based clustering 137
Faster distance calculations 138
Discussion 139
4.9 Further reading 139
CONTENTS ix
5 Credibility: Evaluating what’s been learned 143
5.1 Training and testing 144
5.2 Predicting performance 146
5.3 Cross-validation 149
5.4 Other estimates 151
Leave-one-out 151
The bootstrap 152
5.5 Comparing data mining methods 153
5.6 Predicting probabilities 157
Quadratic loss function 158
Informational loss function 159
Discussion 160
5.7 Counting the cost 161
Cost-sensitive classification 164
Cost-sensitive learning 165
Lift charts 166
ROC curves 168
Recall–precision curves 171
Discussion 172
Cost curves 173
5.8 Evaluating numeric prediction 176
5.9 The minimum description length principle 179
5.10 Applying the MDL principle to clustering 183
5.11 Further reading 184
6 Implementations: Real machine learning schemes 187
6.1 Decision trees 189
Numeric attributes 189
Missing values 191
Pruning 192
Estimating error rates 193
Complexity of decision tree induction 196
From trees to rules 198
C4.5: Choices and options 198
Discussion 199
6.2 Classification rules 200
Criteria for choosing tests 200
Missing values, numeric attributes 201
x CONTENTS
Generating good rules 202
Using global optimization 205
Obtaining rules from partial decision trees 207
Rules with exceptions 210
Discussion 213
6.3 Extending linear models 214
The maximum margin hyperplane 215
Nonlinear class boundaries 217
Support vector regression 219
The kernel perceptron 222
Multilayer perceptrons 223
Discussion 235
6.4 Instance-based learning 235
Reducing the number of exemplars 236
Pruning noisy exemplars 236
Weighting attributes 237
Generalizing exemplars 238
Distance functions for generalized exemplars 239
Generalized distance functions 241
Discussion 242
6.5 Numeric prediction 243
Model trees 244
Building the tree 245
Pruning the tree 245
Nominal attributes 246
Missing values 246
Pseudocode for model tree induction 247
Rules from model trees 250
Locally weighted linear regression 251
Discussion 253
6.6 Clustering 254
Choosing the number of clusters 254
Incremental clustering 255
Category utility 260
Probability-based clustering 262
The EM algorithm 265
Extending the mixture model 266
Bayesian clustering 268
Discussion 270
6.7 Bayesian networks 271
Making predictions 272
Learning Bayesian networks 276
CONTENTS xi
Specific algorithms 278
Data structures for fast learning 280
Discussion 283
7 Transformations: Engineering the input and output 285
7.1 Attribute selection 288
Scheme-independent selection 290
Searching the attribute space 292
Scheme-specific selection 294
7.2 Discretizing numeric attributes 296
Unsupervised discretization 297
Entropy-based discretization 298
Other discretization methods 302
Entropy-based versus error-based discretization 302
Converting discrete to numeric attributes 304
7.3 Some useful transformations 305
Principal components analysis 306
Random projections 309
Text to attribute vectors 309
Time series 311
7.4 Automatic data cleansing 312
Improving decision trees 312
Robust regression 313
Detecting anomalies 314
7.5 Combining multiple models 315
Bagging 316
Bagging with costs 319
Randomization 320
Boosting 321
Additive regression 325
Additive logistic regression 327
Option trees 328
Logistic model trees 331
Stacking 332
Error-correcting output codes 334
7.6 Using unlabeled data 337
Clustering for classification 337
Co-training 339
EM and co-training 340
7.7 Further reading 341
xii CONTENTS
8 Moving on: Extensions and applications 345
8.1 Learning from massive datasets 346
8.2 Incorporating domain knowledge 349
8.3 Text and Web mining 351
8.4 Adversarial situations 356
8.5 Ubiquitous data mining 358
8.6 Further reading 361
Part II The Weka machine learning workbench 363
9 Introduction to Weka 365
9.1 What’s in Weka? 366
9.2 How do you use it? 367
9.3 What else can you do? 368
9.4 How do you get it? 368
10 The Explorer 369
10.1 Getting started 369
Preparing the data 370
Loading the data into the Explorer 370
Building a decision tree 373
Examining the output 373
Doing it again 377
Working with models 377
When things go wrong 378
10.2 Exploring the Explorer 380
Loading and filtering files 380
Training and testing learning schemes 384
Do it yourself: The User Classifier 388
Using a metalearner 389
Clustering and association rules 391
Attribute selection 392
Visualization 393
10.3 Filtering algorithms 393
Unsupervised attribute filters 395
Unsupervised instance filters 400
Supervised filters 401
CONTENTS xiii
10.4 Learning algorithms 403
Bayesian classifiers 403
Trees 406
Rules 408
Functions 409
Lazy classifiers 413
Miscellaneous classifiers 414
10.5 Metalearning algorithms 414
Bagging and randomization 414
Boosting 416
Combining classifiers 417
Cost-sensitive learning 417
Optimizing performance 417
Retargeting classifiers for different tasks 418
10.6 Clustering algorithms 418
10.7 Association-rule learners 419
10.8 Attribute selection 420
Attribute subset evaluators 422
Single-attribute evaluators 422
Search methods 423
11 The Knowledge Flow interface 427
11.1 Getting started 427
11.2 The Knowledge Flow components 430
11.3 Configuring and connecting the components 431
11.4 Incremental learning 433
12 The Experimenter 437
12.1 Getting started 438
Running an experiment 439
Analyzing the results 440
12.2 Simple setup 441
12.3 Advanced setup 442
12.4 The Analyze panel 443
12.5 Distributing processing over several machines 445
xiv CONTENTS
13 The command-line interface 449
13.1 Getting started 449
13.2 The structure of Weka 450
Classes, instances, and packages 450
The weka.core package 451
The weka.classifiers package 453
Other packages 455
Javadoc indices 456
13.3 Command-line options 456
Generic options 456
Scheme-specific options 458
14 Embedded machine learning 461
14.1 A simple data mining application 461
14.2 Going through the code 462
main() 462
MessageClassifier() 462
updateData() 468
classifyMessage() 468
15 Writing new learning schemes 471
15.1 An example classifier 471
buildClassifier() 472
makeTree() 472
computeInfoGain() 480
classifyInstance() 480
main() 481
15.2 Conventions for implementing classifiers 483
References 485
Index 505
About the authors 525
CONTENTS xv
List of Figures
Figure 1.1 Rules for the contact lens data. 13
Figure 1.2 Decision tree for the contact lens data. 14
Figure 1.3 Decision trees for the labor negotiations data. 19
Figure 2.1 A family tree and two ways of expressing the sister-of
relation. 46
Figure 2.2 ARFF file for the weather data. 54
Figure 3.1 Constructing a decision tree interactively: (a) creating a
rectangular test involving petallength and petalwidth and (b)
the resulting (unfinished) decision tree. 64
Figure 3.2 Decision tree for a simple disjunction. 66
Figure 3.3 The exclusive-or problem. 67
Figure 3.4 Decision tree with a replicated subtree. 68
Figure 3.5 Rules for the Iris data. 72
Figure 3.6 The shapes problem. 73
Figure 3.7 Models for the CPU performance data: (a) linear regression,
(b) regression tree, and (c) model tree. 77
Figure 3.8 Different ways of partitioning the instance space. 79
Figure 3.9 Different ways of representing clusters. 81
Figure 4.1 Pseudocode for 1R. 85
Figure 4.2 Tree stumps for the weather data. 98
Figure 4.3 Expanded tree stumps for the weather data. 100
Figure 4.4 Decision tree for the weather data. 101
Figure 4.5 Tree stump for the ID code attribute. 103
Figure 4.6 Covering algorithm: (a) covering the instances and (b) the
decision tree for the same problem. 106
Figure 4.7 The instance space during operation of a covering
algorithm. 108
Figure 4.8 Pseudocode for a basic rule learner. 111
Figure 4.9 Logistic regression: (a) the logit transform and (b) an example
logistic regression function. 122
xvii
Figure 4.10 The perceptron: (a) learning rule and (b) representation as
a neural network. 125
Figure 4.11 The Winnow algorithm: (a) the unbalanced version and (b)
the balanced version. 127
Figure 4.12 A kD-tree for four training instances: (a) the tree and (b)
instances and splits. 130
Figure 4.13 Using a kD-tree to find the nearest neighbor of the
star. 131
Figure 4.14 Ball tree for 16 training instances: (a) instances and balls and
(b) the tree. 134
Figure 4.15 Ruling out an entire ball (gray) based on a target point (star)
and its current nearest neighbor. 135
Figure 4.16 A ball tree: (a) two cluster centers and their dividing line and
(b) the corresponding tree. 140
Figure 5.1 A hypothetical lift chart. 168
Figure 5.2 A sample ROC curve. 169
Figure 5.3 ROC curves for two learning methods. 170
Figure 5.4 Effects of varying the probability threshold: (a) the error curve
and (b) the cost curve. 174
Figure 6.1 Example of subtree raising, where node C is “raised” to
subsume node B. 194
Figure 6.2 Pruning the labor negotiations decision tree. 196
Figure 6.3 Algorithm for forming rules by incremental reduced-error
pruning. 205
Figure 6.4 RIPPER: (a) algorithm for rule learning and (b) meaning of
symbols. 206
Figure 6.5 Algorithm for expanding examples into a partial
tree. 208
Figure 6.6 Example of building a partial tree. 209
Figure 6.7 Rules with exceptions for the iris data. 211
Figure 6.8 A maximum margin hyperplane. 216
Figure 6.9 Support vector regression: (a) e = 1, (b) e = 2, and (c)
e = 0.5. 221
Figure 6.10 Example datasets and corresponding perceptrons. 225
Figure 6.11 Step versus sigmoid: (a) step function and (b) sigmoid
function. 228
Figure 6.12 Gradient descent using the error function x2
+ 1. 229
Figure 6.13 Multilayer perceptron with a hidden layer. 231
Figure 6.14 A boundary between two rectangular classes. 240
Figure 6.15 Pseudocode for model tree induction. 248
Figure 6.16 Model tree for a dataset with nominal attributes. 250
Figure 6.17 Clustering the weather data. 256
xviii LIST OF FIGURES
Figure 6.18 Hierarchical clusterings of the iris data. 259
Figure 6.19 A two-class mixture model. 264
Figure 6.20 A simple Bayesian network for the weather data. 273
Figure 6.21 Another Bayesian network for the weather data. 274
Figure 6.22 The weather data: (a) reduced version and (b) corresponding
AD tree. 281
Figure 7.1 Attribute space for the weather dataset. 293
Figure 7.2 Discretizing the temperature attribute using the entropy
method. 299
Figure 7.3 The result of discretizing the temperature attribute. 300
Figure 7.4 Class distribution for a two-class, two-attribute
problem. 303
Figure 7.5 Principal components transform of a dataset: (a) variance of
each component and (b) variance plot. 308
Figure 7.6 Number of international phone calls from Belgium,
1950–1973. 314
Figure 7.7 Algorithm for bagging. 319
Figure 7.8 Algorithm for boosting. 322
Figure 7.9 Algorithm for additive logistic regression. 327
Figure 7.10 Simple option tree for the weather data. 329
Figure 7.11 Alternating decision tree for the weather data. 330
Figure 10.1 The Explorer interface. 370
Figure 10.2 Weather data: (a) spreadsheet, (b) CSV format, and
(c) ARFF. 371
Figure 10.3 The Weka Explorer: (a) choosing the Explorer interface and
(b) reading in the weather data. 372
Figure 10.4 Using J4.8: (a) finding it in the classifiers list and (b) the
Classify tab. 374
Figure 10.5 Output from the J4.8 decision tree learner. 375
Figure 10.6 Visualizing the result of J4.8 on the iris dataset: (a) the tree
and (b) the classifier errors. 379
Figure 10.7 Generic object editor: (a) the editor, (b) more information
(click More), and (c) choosing a converter
(click Choose). 381
Figure 10.8 Choosing a filter: (a) the filters menu, (b) an object editor, and
(c) more information (click More). 383
Figure 10.9 The weather data with two attributes removed. 384
Figure 10.10 Processing the CPU performance data with M5¢. 385
Figure 10.11 Output from the M5¢ program for numeric
prediction. 386
Figure 10.12 Visualizing the errors: (a) from M5¢ and (b) from linear
regression. 388
LIST OF FIGURES xix
Figure 10.13 Working on the segmentation data with the User Classifier:
(a) the data visualizer and (b) the tree visualizer. 390
Figure 10.14 Configuring a metalearner for boosting decision
stumps. 391
Figure 10.15 Output from the Apriori program for association rules. 392
Figure 10.16 Visualizing the Iris dataset. 394
Figure 10.17 Using Weka’s metalearner for discretization: (a) configuring
FilteredClassifier, and (b) the menu of filters. 402
Figure 10.18 Visualizing a Bayesian network for the weather data (nominal
version): (a) default output, (b) a version with the
maximum number of parents set to 3 in the search
algorithm, and (c) probability distribution table for the
windy node in (b). 406
Figure 10.19 Changing the parameters for J4.8. 407
Figure 10.20 Using Weka’s neural-network graphical user
interface. 411
Figure 10.21 Attribute selection: specifying an evaluator and a search
method. 420
Figure 11.1 The Knowledge Flow interface. 428
Figure 11.2 Configuring a data source: (a) the right-click menu and
(b) the file browser obtained from the Configure menu
item. 429
Figure 11.3 Operations on the Knowledge Flow components. 432
Figure 11.4 A Knowledge Flow that operates incrementally: (a) the
configuration and (b) the strip chart output. 434
Figure 12.1 An experiment: (a) setting it up, (b) the results file, and
(c) a spreadsheet with the results. 438
Figure 12.2 Statistical test results for the experiment in
Figure 12.1. 440
Figure 12.3 Setting up an experiment in advanced mode. 442
Figure 12.4 Rows and columns of Figure 12.2: (a) row field, (b) column
field, (c) result of swapping the row and column selections,
and (d) substituting Run for Dataset as rows. 444
Figure 13.1 Using Javadoc: (a) the front page and (b) the weka.core
package. 452
Figure 13.2 DecisionStump: A class of the weka.classifiers.trees
package. 454
Figure 14.1 Source code for the message classifier. 463
Figure 15.1 Source code for the ID3 decision tree learner. 473
xx LIST OF FIGURES
List ofTables
Table 1.1 The contact lens data. 6
Table 1.2 The weather data. 11
Table 1.3 Weather data with some numeric attributes. 12
Table 1.4 The iris data. 15
Table 1.5 The CPU performance data. 16
Table 1.6 The labor negotiations data. 18
Table 1.7 The soybean data. 21
Table 2.1 Iris data as a clustering problem. 44
Table 2.2 Weather data with a numeric class. 44
Table 2.3 Family tree represented as a table. 47
Table 2.4 The sister-of relation represented in a table. 47
Table 2.5 Another relation represented as a table. 49
Table 3.1 A new iris flower. 70
Table 3.2 Training data for the shapes problem. 74
Table 4.1 Evaluating the attributes in the weather data. 85
Table 4.2 The weather data with counts and probabilities. 89
Table 4.3 A new day. 89
Table 4.4 The numeric weather data with summary statistics. 93
Table 4.5 Another new day. 94
Table 4.6 The weather data with identification codes. 103
Table 4.7 Gain ratio calculations for the tree stumps of Figure 4.2. 104
Table 4.8 Part of the contact lens data for which astigmatism = yes. 109
Table 4.9 Part of the contact lens data for which astigmatism = yes and
tear production rate = normal. 110
Table 4.10 Item sets for the weather data with coverage 2 or
greater. 114
Table 4.11 Association rules for the weather data. 116
Table 5.1 Confidence limits for the normal distribution. 148
xxi
Table 5.2 Confidence limits for Student’s distribution with 9 degrees
of freedom. 155
Table 5.3 Different outcomes of a two-class prediction. 162
Table 5.4 Different outcomes of a three-class prediction: (a) actual and
(b) expected. 163
Table 5.5 Default cost matrixes: (a) a two-class case and (b) a three-class
case. 164
Table 5.6 Data for a lift chart. 167
Table 5.7 Different measures used to evaluate the false positive versus the
false negative tradeoff. 172
Table 5.8 Performance measures for numeric prediction. 178
Table 5.9 Performance measures for four numeric prediction
models. 179
Table 6.1 Linear models in the model tree. 250
Table 7.1 Transforming a multiclass problem into a two-class one:
(a) standard method and (b) error-correcting code. 335
Table 10.1 Unsupervised attribute filters. 396
Table 10.2 Unsupervised instance filters. 400
Table 10.3 Supervised attribute filters. 402
Table 10.4 Supervised instance filters. 402
Table 10.5 Classifier algorithms in Weka. 404
Table 10.6 Metalearning algorithms in Weka. 415
Table 10.7 Clustering algorithms. 419
Table 10.8 Association-rule learners. 419
Table 10.9 Attribute evaluation methods for attribute selection. 421
Table 10.10 Search methods for attribute selection. 421
Table 11.1 Visualization and evaluation components. 430
Table 13.1 Generic options for learning schemes in Weka. 457
Table 13.2 Scheme-specific options for the J4.8 decision tree
learner. 458
Table 15.1 Simple learning schemes in Weka. 472
xxii LIST OF TABLES
Preface
The convergence of computing and communication has produced a society that
feeds on information. Yet most of the information is in its raw form: data. If
data is characterized as recorded facts, then information is the set of patterns,
or expectations, that underlie the data. There is a huge amount of information
locked up in databases—information that is potentially important but has not
yet been discovered or articulated. Our mission is to bring it forth.
Data mining is the extraction of implicit, previously unknown, and poten-
tially useful information from data. The idea is to build computer programs that
sift through databases automatically, seeking regularities or patterns. Strong pat-
terns, if found, will likely generalize to make accurate predictions on future data.
Of course, there will be problems. Many patterns will be banal and uninterest-
ing. Others will be spurious, contingent on accidental coincidences in the par-
ticular dataset used. In addition real data is imperfect: Some parts will be
garbled, and some will be missing. Anything discovered will be inexact: There
will be exceptions to every rule and cases not covered by any rule. Algorithms
need to be robust enough to cope with imperfect data and to extract regulari-
ties that are inexact but useful.
Machine learning provides the technical basis of data mining. It is used to
extract information from the raw data in databases—information that is
expressed in a comprehensible form and can be used for a variety of purposes.
The process is one of abstraction: taking the data, warts and all, and inferring
whatever structure underlies it. This book is about the tools and techniques of
machine learning used in practical data mining for finding, and describing,
structural patterns in data.
As with any burgeoning new technology that enjoys intense commercial
attention, the use of data mining is surrounded by a great deal of hype in the
technical—and sometimes the popular—press. Exaggerated reports appear of
the secrets that can be uncovered by setting learning algorithms loose on oceans
of data. But there is no magic in machine learning, no hidden power, no
xxiii
alchemy. Instead, there is an identifiable body of simple and practical techniques
that can often extract useful information from raw data. This book describes
these techniques and shows how they work.
We interpret machine learning as the acquisition of structural descriptions
from examples. The kind of descriptions found can be used for prediction,
explanation, and understanding. Some data mining applications focus on pre-
diction: forecasting what will happen in new situations from data that describe
what happened in the past, often by guessing the classification of new examples.
But we are equally—perhaps more—interested in applications in which the
result of “learning” is an actual description of a structure that can be used to
classify examples. This structural description supports explanation, under-
standing, and prediction. In our experience, insights gained by the applications’
users are of most interest in the majority of practical data mining applications;
indeed, this is one of machine learning’s major advantages over classical statis-
tical modeling.
The book explains a variety of machine learning methods. Some are peda-
gogically motivated: simple schemes designed to explain clearly how the basic
ideas work. Others are practical: real systems used in applications today. Many
are contemporary and have been developed only in the last few years.
A comprehensive software resource, written in the Java language, has been
created to illustrate the ideas in the book. Called the Waikato Environment for
Knowledge Analysis, or Weka1
for short, it is available as source code on the
World Wide Web at http://www.cs.waikato.ac.nz/ml/weka. It is a full, industrial-
strength implementation of essentially all the techniques covered in this book.
It includes illustrative code and working implementations of machine learning
methods. It offers clean, spare implementations of the simplest techniques,
designed to aid understanding of the mechanisms involved. It also provides a
workbench that includes full, working, state-of-the-art implementations of
many popular learning schemes that can be used for practical data mining or
for research. Finally, it contains a framework, in the form of a Java class library,
that supports applications that use embedded machine learning and even the
implementation of new learning schemes.
The objective of this book is to introduce the tools and techniques for
machine learning that are used in data mining. After reading it, you will under-
stand what these techniques are and appreciate their strengths and applicabil-
ity. If you wish to experiment with your own data, you will be able to do this
easily with the Weka software.
xxiv PREFACE
1
Found only on the islands of New Zealand, the weka (pronounced to rhyme with Mecca)
is a flightless bird with an inquisitive nature.
The book spans the gulf between the intensely practical approach taken by
trade books that provide case studies on data mining and the more theoretical,
principle-driven exposition found in current textbooks on machine learning.
(A brief description of these books appears in the Further reading section at the
end of Chapter 1.) This gulf is rather wide. To apply machine learning tech-
niques productively, you need to understand something about how they work;
this is not a technology that you can apply blindly and expect to get good results.
Different problems yield to different techniques, but it is rarely obvious which
techniques are suitable for a given situation: you need to know something about
the range of possible solutions. We cover an extremely wide range of techniques.
We can do this because, unlike many trade books, this volume does not promote
any particular commercial software or approach. We include a large number of
examples, but they use illustrative datasets that are small enough to allow you
to follow what is going on. Real datasets are far too large to show this (and in
any case are usually company confidential). Our datasets are chosen not to
illustrate actual large-scale practical problems but to help you understand what
the different techniques do, how they work, and what their range of application
is.
The book is aimed at the technically aware general reader interested in the
principles and ideas underlying the current practice of data mining. It will
also be of interest to information professionals who need to become acquainted
with this new technology and to all those who wish to gain a detailed technical
understanding of what machine learning involves. It is written for an eclectic
audience of information systems practitioners, programmers, consultants,
developers, information technology managers, specification writers, patent
examiners, and curious laypeople—as well as students and professors—who
need an easy-to-read book with lots of illustrations that describes what the
major machine learning techniques are, what they do, how they are used, and
how they work. It is practically oriented, with a strong “how to” flavor, and
includes algorithms, code, and implementations. All those involved in practical
data mining will benefit directly from the techniques described. The book is
aimed at people who want to cut through to the reality that underlies the hype
about machine learning and who seek a practical, nonacademic, unpretentious
approach. We have avoided requiring any specific theoretical or mathematical
knowledge except in some sections marked by a light gray bar in the margin.
These contain optional material, often for the more technical or theoretically
inclined reader, and may be skipped without loss of continuity.
The book is organized in layers that make the ideas accessible to readers who
are interested in grasping the basics and to those who would like more depth of
treatment, along with full details on the techniques covered.We believe that con-
sumers of machine learning need to have some idea of how the algorithms they
use work. It is often observed that data models are only as good as the person
PREFACE xxv
who interprets them, and that person needs to know something about how the
models are produced to appreciate the strengths, and limitations, of the tech-
nology. However, it is not necessary for all data model users to have a deep
understanding of the finer details of the algorithms.
We address this situation by describing machine learning methods at succes-
sive levels of detail. You will learn the basic ideas, the topmost level, by reading
the first three chapters. Chapter 1 describes, through examples, what machine
learning is and where it can be used; it also provides actual practical applica-
tions. Chapters 2 and 3 cover the kinds of input and output—or knowledge
representation—involved. Different kinds of output dictate different styles
of algorithm, and at the next level Chapter 4 describes the basic methods of
machine learning, simplified to make them easy to comprehend. Here the prin-
ciples involved are conveyed in a variety of algorithms without getting into
intricate details or tricky implementation issues. To make progress in the appli-
cation of machine learning techniques to particular data mining problems, it is
essential to be able to measure how well you are doing. Chapter 5, which can be
read out of sequence, equips you to evaluate the results obtained from machine
learning, addressing the sometimes complex issues involved in performance
evaluation.
At the lowest and most detailed level, Chapter 6 exposes in naked detail the
nitty-gritty issues of implementing a spectrum of machine learning algorithms,
including the complexities necessary for them to work well in practice.Although
many readers may want to ignore this detailed information, it is at this level that
the full, working, tested implementations of machine learning schemes in Weka
are written. Chapter 7 describes practical topics involved with engineering the
input to machine learning—for example, selecting and discretizing attributes—
and covers several more advanced techniques for refining and combining the
output from different learning techniques. The final chapter of Part I looks to
the future.
The book describes most methods used in practical machine learning.
However, it does not cover reinforcement learning, because it is rarely applied
in practical data mining; genetic algorithm approaches, because these are just
an optimization technique; or relational learning and inductive logic program-
ming, because they are rarely used in mainstream data mining applications.
The data mining system that illustrates the ideas in the book is described in
Part II to clearly separate conceptual material from the practical aspects of how
to use it. You can skip to Part II directly from Chapter 4 if you are in a hurry to
analyze your data and don’t want to be bothered with the technical details.
Java has been chosen for the implementations of machine learning tech-
niques that accompany this book because, as an object-oriented programming
language, it allows a uniform interface to learning schemes and methods for pre-
and postprocessing. We have chosen Java instead of C++, Smalltalk, or other
xxvi PREFACE
object-oriented languages because programs written in Java can be run on
almost any computer without having to be recompiled, having to undergo com-
plicated installation procedures, or—worst of all—having to change the code.
A Java program is compiled into byte-code that can be executed on any com-
puter equipped with an appropriate interpreter. This interpreter is called the
Java virtual machine. Java virtual machines—and, for that matter, Java compil-
ers—are freely available for all important platforms.
Like all widely used programming languages, Java has received its share of
criticism. Although this is not the place to elaborate on such issues, in several
cases the critics are clearly right. However, of all currently available program-
ming languages that are widely supported, standardized, and extensively docu-
mented, Java seems to be the best choice for the purpose of this book. Its main
disadvantage is speed of execution—or lack of it. Executing a Java program is
several times slower than running a corresponding program written in C lan-
guage because the virtual machine has to translate the byte-code into machine
code before it can be executed. In our experience the difference is a factor of
three to five if the virtual machine uses a just-in-time compiler. Instead of trans-
lating each byte-code individually, a just-in-time compiler translates whole
chunks of byte-code into machine code, thereby achieving significant speedup.
However, if this is still to slow for your application, there are compilers that
translate Java programs directly into machine code, bypassing the byte-code
step. This code cannot be executed on other platforms, thereby sacrificing one
of Java’s most important advantages.
Updated and revised content
We finished writing the first edition of this book in 1999 and now, in April 2005,
are just polishing this second edition. The areas of data mining and machine
learning have matured in the intervening years. Although the core of material
in this edition remains the same, we have made the most of our opportunity to
update it to reflect the changes that have taken place over 5 years. There have
been errors to fix, errors that we had accumulated in our publicly available errata
file. Surprisingly few were found, and we hope there are even fewer in this
second edition. (The errata for the second edition may be found through the
book’s home page at http://www.cs.waikato.ac.nz/ml/weka/book.html.) We have
thoroughly edited the material and brought it up to date, and we practically
doubled the number of references. The most enjoyable part has been adding
new material. Here are the highlights.
Bowing to popular demand, we have added comprehensive information on
neural networks: the perceptron and closely related Winnow algorithm in
Section 4.6 and the multilayer perceptron and backpropagation algorithm
PREFACE xxvii
in Section 6.3. We have included more recent material on implementing
nonlinear decision boundaries using both the kernel perceptron and radial basis
function networks. There is a new section on Bayesian networks, again in
response to readers’ requests, with a description of how to learn classifiers based
on these networks and how to implement them efficiently using all-dimensions
trees.
The Weka machine learning workbench that accompanies the book, a widely
used and popular feature of the first edition, has acquired a radical new look in
the form of an interactive interface—or rather, three separate interactive inter-
faces—that make it far easier to use. The primary one is the Explorer, which
gives access to all of Weka’s facilities using menu selection and form filling. The
others are the Knowledge Flow interface, which allows you to design configu-
rations for streamed data processing, and the Experimenter, with which you set
up automated experiments that run selected machine learning algorithms with
different parameter settings on a corpus of datasets, collect performance statis-
tics, and perform significance tests on the results. These interfaces lower the bar
for becoming a practicing data miner, and we include a full description of how
to use them. However, the book continues to stand alone, independent of Weka,
and to underline this we have moved all material on the workbench into a sep-
arate Part II at the end of the book.
In addition to becoming far easier to use, Weka has grown over the last 5
years and matured enormously in its data mining capabilities. It now includes
an unparalleled range of machine learning algorithms and related techniques.
The growth has been partly stimulated by recent developments in the field and
partly led by Weka users and driven by demand. This puts us in a position in
which we know a great deal about what actual users of data mining want, and
we have capitalized on this experience when deciding what to include in this
new edition.
The earlier chapters, containing more general and foundational material,
have suffered relatively little change. We have added more examples of fielded
applications to Chapter 1, a new subsection on sparse data and a little on string
attributes and date attributes to Chapter 2, and a description of interactive deci-
sion tree construction, a useful and revealing technique to help you grapple with
your data using manually built decision trees, to Chapter 3.
In addition to introducing linear decision boundaries for classification, the
infrastructure for neural networks, Chapter 4 includes new material on multi-
nomial Bayes models for document classification and on logistic regression. The
last 5 years have seen great interest in data mining for text, and this is reflected
in our introduction to string attributes in Chapter 2, multinomial Bayes for doc-
ument classification in Chapter 4, and text transformations in Chapter 7.
Chapter 4 includes a great deal of new material on efficient data structures for
searching the instance space: kD-trees and the recently invented ball trees. These
xxviii PREFACE
are used to find nearest neighbors efficiently and to accelerate distance-based
clustering.
Chapter 5 describes the principles of statistical evaluation of machine learn-
ing, which have not changed. The main addition, apart from a note on the Kappa
statistic for measuring the success of a predictor, is a more detailed treatment
of cost-sensitive learning. We describe how to use a classifier, built without
taking costs into consideration, to make predictions that are sensitive to cost;
alternatively, we explain how to take costs into account during the training
process to build a cost-sensitive model. We also cover the popular new tech-
nique of cost curves.
There are several additions to Chapter 6, apart from the previously men-
tioned material on neural networks and Bayesian network classifiers. More
details—gory details—are given of the heuristics used in the successful RIPPER
rule learner. We describe how to use model trees to generate rules for numeric
prediction. We show how to apply locally weighted regression to classification
problems. Finally, we describe the X-means clustering algorithm, which is a big
improvement on traditional k-means.
Chapter 7 on engineering the input and output has changed most, because
this is where recent developments in practical machine learning have been con-
centrated. We describe new attribute selection schemes such as race search and
the use of support vector machines and new methods for combining models
such as additive regression, additive logistic regression, logistic model trees, and
option trees. We give a full account of LogitBoost (which was mentioned in the
first edition but not described). There is a new section on useful transforma-
tions, including principal components analysis and transformations for text
mining and time series. We also cover recent developments in using unlabeled
data to improve classification, including the co-training and co-EM methods.
The final chapter of Part I on new directions and different perspectives has
been reworked to keep up with the times and now includes contemporary chal-
lenges such as adversarial learning and ubiquitous data mining.
Acknowledgments
Writing the acknowledgments is always the nicest part! A lot of people have
helped us, and we relish this opportunity to thank them. This book has arisen
out of the machine learning research project in the Computer Science Depart-
ment at the University of Waikato, New Zealand. We have received generous
encouragement and assistance from the academic staff members on that project:
John Cleary, Sally Jo Cunningham, Matt Humphrey, Lyn Hunt, Bob McQueen,
Lloyd Smith, and Tony Smith. Special thanks go to Mark Hall, Bernhard
Pfahringer, and above all Geoff Holmes, the project leader and source of inspi-
PREFACE xxix
ration. All who have worked on the machine learning project here have con-
tributed to our thinking: we would particularly like to mention Steve Garner,
Stuart Inglis, and Craig Nevill-Manning for helping us to get the project off the
ground in the beginning when success was less certain and things were more
difficult.
The Weka system that illustrates the ideas in this book forms a crucial com-
ponent of it. It was conceived by the authors and designed and implemented by
Eibe Frank, along with Len Trigg and Mark Hall. Many people in the machine
learning laboratory at Waikato made significant contributions. Since the first
edition of the book the Weka team has expanded considerably: so many people
have contributed that it is impossible to acknowledge everyone properly. We are
grateful to Remco Bouckaert for his implementation of Bayesian networks, Dale
Fletcher for many database-related aspects, Ashraf Kibriya and Richard Kirkby
for contributions far too numerous to list, Niels Landwehr for logistic model
trees, Abdelaziz Mahoui for the implementation of K*, Stefan Mutter for asso-
ciation rule mining, Gabi Schmidberger and Malcolm Ware for numerous mis-
cellaneous contributions, Tony Voyle for least-median-of-squares regression,
Yong Wang for Pace regression and the implementation of M5¢, and Xin Xu for
JRip, logistic regression, and many other contributions. Our sincere thanks go
to all these people for their dedicated work and to the many contributors to
Weka from outside our group at Waikato.
Tucked away as we are in a remote (but very pretty) corner of the Southern
Hemisphere, we greatly appreciate the visitors to our department who play
a crucial role in acting as sounding boards and helping us to develop our
thinking. We would like to mention in particular Rob Holte, Carl Gutwin, and
Russell Beale, each of whom visited us for several months; David Aha, who
although he only came for a few days did so at an early and fragile stage of the
project and performed a great service by his enthusiasm and encouragement;
and Kai Ming Ting, who worked with us for 2 years on many of the topics
described in Chapter 7 and helped to bring us into the mainstream of machine
learning.
Students at Waikato have played a significant role in the development of the
project. Jamie Littin worked on ripple-down rules and relational learning. Brent
Martin explored instance-based learning and nested instance-based representa-
tions. Murray Fife slaved over relational learning, and Nadeeka Madapathage
investigated the use of functional languages for expressing machine learning
algorithms. Other graduate students have influenced us in numerous ways, par-
ticularly Gordon Paynter, YingYing Wen, and Zane Bray, who have worked with
us on text mining. Colleagues Steve Jones and Malika Mahoui have also made
far-reaching contributions to these and other machine learning projects. More
recently we have learned much from our many visiting students from Freiburg,
including Peter Reutemann and Nils Weidmann.
xxx PREFACE
Ian Witten would like to acknowledge the formative role of his former stu-
dents at Calgary, particularly Brent Krawchuk, Dave Maulsby, Thong Phan, and
Tanja Mitrovic, all of whom helped him develop his early ideas in machine
learning, as did faculty members Bruce MacDonald, Brian Gaines, and David
Hill at Calgary and John Andreae at the University of Canterbury.
Eibe Frank is indebted to his former supervisor at the University of
Karlsruhe, Klaus-Peter Huber (now with SAS Institute), who infected him with
the fascination of machines that learn. On his travels Eibe has benefited from
interactions with Peter Turney, Joel Martin, and Berry de Bruijn in Canada and
with Luc de Raedt, Christoph Helma, Kristian Kersting, Stefan Kramer, Ulrich
Rückert, and Ashwin Srinivasan in Germany.
Diane Cerra and Asma Stephan of Morgan Kaufmann have worked hard to
shape this book, and Lisa Royse, our production editor, has made the process
go smoothly. Bronwyn Webster has provided excellent support at the Waikato
end.
We gratefully acknowledge the unsung efforts of the anonymous reviewers,
one of whom in particular made a great number of pertinent and constructive
comments that helped us to improve this book significantly. In addition, we
would like to thank the librarians of the Repository of Machine Learning Data-
bases at the University of California, Irvine, whose carefully collected datasets
have been invaluable in our research.
Our research has been funded by the New Zealand Foundation for Research,
Science and Technology and the Royal Society of New Zealand Marsden Fund.
The Department of Computer Science at the University of Waikato has gener-
ously supported us in all sorts of ways, and we owe a particular debt of
gratitude to Mark Apperley for his enlightened leadership and warm encour-
agement. Part of the first edition was written while both authors were visiting
the University of Calgary, Canada, and the support of the Computer Science
department there is gratefully acknowledged—as well as the positive and helpful
attitude of the long-suffering students in the machine learning course on whom
we experimented.
In producing the second edition Ian was generously supported by Canada’s
Informatics Circle of Research Excellence and by the University of Lethbridge
in southern Alberta, which gave him what all authors yearn for—a quiet space
in pleasant and convivial surroundings in which to work.
Last, and most of all, we are grateful to our families and partners. Pam, Anna,
and Nikki were all too well aware of the implications of having an author in the
house (“not again!”) but let Ian go ahead and write the book anyway. Julie was
always supportive, even when Eibe had to burn the midnight oil in the machine
learning lab, and Immo and Ollig provided exciting diversions. Between us we
hail from Canada, England, Germany, Ireland, and Samoa: New Zealand has
brought us together and provided an ideal, even idyllic, place to do this work.
PREFACE xxxi
p a r t I
Machine LearningTools
andTechniques
Human in vitro fertilization involves collecting several eggs from a woman’s
ovaries, which, after fertilization with partner or donor sperm, produce several
embryos. Some of these are selected and transferred to the woman’s uterus. The
problem is to select the “best” embryos to use—the ones that are most likely to
survive. Selection is based on around 60 recorded features of the embryos—
characterizing their morphology, oocyte, follicle, and the sperm sample. The
number of features is sufficiently large that it is difficult for an embryologist to
assess them all simultaneously and correlate historical data with the crucial
outcome of whether that embryo did or did not result in a live child. In a
research project in England, machine learning is being investigated as a tech-
nique for making the selection, using as training data historical records of
embryos and their outcome.
Every year, dairy farmers in New Zealand have to make a tough business deci-
sion: which cows to retain in their herd and which to sell off to an abattoir. Typi-
cally, one-fifth of the cows in a dairy herd are culled each year near the end of
the milking season as feed reserves dwindle. Each cow’s breeding and milk pro-
c h a p t e r 1
What’s ItAllAbout?
3
duction history influences this decision. Other factors include age (a cow is
nearing the end of its productive life at 8 years), health problems, history of dif-
ficult calving, undesirable temperament traits (kicking or jumping fences), and
not being in calf for the following season. About 700 attributes for each of
several million cows have been recorded over the years. Machine learning is
being investigated as a way of ascertaining what factors are taken into account
by successful farmers—not to automate the decision but to propagate their skills
and experience to others.
Life and death. From Europe to the antipodes. Family and business. Machine
learning is a burgeoning new technology for mining knowledge from data, a
technology that a lot of people are starting to take seriously.
1.1 Data mining and machine learning
We are overwhelmed with data. The amount of data in the world, in our lives,
seems to go on and on increasing—and there’s no end in sight. Omnipresent
personal computers make it too easy to save things that previously we would
have trashed. Inexpensive multigigabyte disks make it too easy to postpone deci-
sions about what to do with all this stuff—we simply buy another disk and keep
it all. Ubiquitous electronics record our decisions, our choices in the super-
market, our financial habits, our comings and goings.We swipe our way through
the world, every swipe a record in a database. The World Wide Web overwhelms
us with information; meanwhile, every choice we make is recorded.And all these
are just personal choices: they have countless counterparts in the world of com-
merce and industry. We would all testify to the growing gap between the gener-
ation of data and our understanding of it. As the volume of data increases,
inexorably, the proportion of it that people understand decreases, alarmingly.
Lying hidden in all this data is information, potentially useful information, that
is rarely made explicit or taken advantage of.
This book is about looking for patterns in data. There is nothing new about
this. People have been seeking patterns in data since human life began. Hunters
seek patterns in animal migration behavior, farmers seek patterns in crop
growth, politicians seek patterns in voter opinion, and lovers seek patterns in
their partners’ responses. A scientist’s job (like a baby’s) is to make sense of data,
to discover the patterns that govern how the physical world works and encap-
sulate them in theories that can be used for predicting what will happen in new
situations. The entrepreneur’s job is to identify opportunities, that is, patterns
in behavior that can be turned into a profitable business, and exploit them.
In data mining, the data is stored electronically and the search is automated—
or at least augmented—by computer. Even this is not particularly new. Econo-
mists, statisticians, forecasters, and communication engineers have long worked
4 CHAPTER 1 | WHAT’S IT ALL ABOUT?
with the idea that patterns in data can be sought automatically, identified,
validated, and used for prediction. What is new is the staggering increase in
opportunities for finding patterns in data. The unbridled growth of databases
in recent years, databases on such everyday activities as customer choices, brings
data mining to the forefront of new business technologies. It has been estimated
that the amount of data stored in the world’s databases doubles every 20
months, and although it would surely be difficult to justify this figure in any
quantitative sense, we can all relate to the pace of growth qualitatively. As the
flood of data swells and machines that can undertake the searching become
commonplace, the opportunities for data mining increase. As the world grows
in complexity, overwhelming us with the data it generates, data mining becomes
our only hope for elucidating the patterns that underlie it. Intelligently analyzed
data is a valuable resource. It can lead to new insights and, in commercial set-
tings, to competitive advantages.
Data mining is about solving problems by analyzing data already present in
databases. Suppose, to take a well-worn example, the problem is fickle customer
loyalty in a highly competitive marketplace. A database of customer choices,
along with customer profiles, holds the key to this problem. Patterns of
behavior of former customers can be analyzed to identify distinguishing charac-
teristics of those likely to switch products and those likely to remain loyal. Once
such characteristics are found, they can be put to work to identify present cus-
tomers who are likely to jump ship. This group can be targeted for special treat-
ment, treatment too costly to apply to the customer base as a whole. More
positively, the same techniques can be used to identify customers who might be
attracted to another service the enterprise provides, one they are not presently
enjoying, to target them for special offers that promote this service. In today’s
highly competitive, customer-centered, service-oriented economy, data is the
raw material that fuels business growth—if only it can be mined.
Data mining is defined as the process of discovering patterns in data. The
process must be automatic or (more usually) semiautomatic. The patterns
discovered must be meaningful in that they lead to some advantage, usually
an economic advantage. The data is invariably present in substantial
quantities.
How are the patterns expressed? Useful patterns allow us to make nontrivial
predictions on new data. There are two extremes for the expression of a pattern:
as a black box whose innards are effectively incomprehensible and as a trans-
parent box whose construction reveals the structure of the pattern. Both, we are
assuming, make good predictions. The difference is whether or not the patterns
that are mined are represented in terms of a structure that can be examined,
reasoned about, and used to inform future decisions. Such patterns we call struc-
tural because they capture the decision structure in an explicit way. In other
words, they help to explain something about the data.
1.1 DATA MINING AND MACHINE LEARNING 5
Now, finally, we can say what this book is about. It is about techniques for
finding and describing structural patterns in data. Most of the techniques that
we cover have developed within a field known as machine learning. But first let
us look at what structural patterns are.
Describing structural patterns
What is meant by structural patterns? How do you describe them? And what
form does the input take? We will answer these questions by way of illustration
rather than by attempting formal, and ultimately sterile, definitions. There will
be plenty of examples later in this chapter, but let’s examine one right now to
get a feeling for what we’re talking about.
Look at the contact lens data in Table 1.1. This gives the conditions under
which an optician might want to prescribe soft contact lenses, hard contact
lenses, or no contact lenses at all; we will say more about what the individual
6 CHAPTER 1 | WHAT’S IT ALL ABOUT?
Table 1.1 The contact lens data.
Spectacle Tear production Recommended
Age prescription Astigmatism rate lenses
young myope no reduced none
young myope no normal soft
young myope yes reduced none
young myope yes normal hard
young hypermetrope no reduced none
young hypermetrope no normal soft
young hypermetrope yes reduced none
young hypermetrope yes normal hard
pre-presbyopic myope no reduced none
pre-presbyopic myope no normal soft
pre-presbyopic myope yes reduced none
pre-presbyopic myope yes normal hard
pre-presbyopic hypermetrope no reduced none
pre-presbyopic hypermetrope no normal soft
pre-presbyopic hypermetrope yes reduced none
pre-presbyopic hypermetrope yes normal none
presbyopic myope no reduced none
presbyopic myope no normal none
presbyopic myope yes reduced none
presbyopic myope yes normal hard
presbyopic hypermetrope no reduced none
presbyopic hypermetrope no normal soft
presbyopic hypermetrope yes reduced none
presbyopic hypermetrope yes normal none
features mean later. Each line of the table is one of the examples. Part of a struc-
tural description of this information might be as follows:
If tear production rate = reduced then recommendation = none
Otherwise, if age = young and astigmatic = no
then recommendation = soft
Structural descriptions need not necessarily be couched as rules such as these.
Decision trees, which specify the sequences of decisions that need to be made
and the resulting recommendation, are another popular means of expression.
This example is a very simplistic one. First, all combinations of possible
values are represented in the table. There are 24 rows, representing three possi-
ble values of age and two values each for spectacle prescription, astigmatism,
and tear production rate (3 ¥ 2 ¥ 2 ¥ 2 = 24). The rules do not really general-
ize from the data; they merely summarize it. In most learning situations, the set
of examples given as input is far from complete, and part of the job is to gen-
eralize to other, new examples. You can imagine omitting some of the rows in
the table for which tear production rate is reduced and still coming up with the
rule
If tear production rate = reduced then recommendation = none
which would generalize to the missing rows and fill them in correctly. Second,
values are specified for all the features in all the examples. Real-life datasets
invariably contain examples in which the values of some features, for some
reason or other, are unknown—for example, measurements were not taken or
were lost. Third, the preceding rules classify the examples correctly, whereas
often, because of errors or noise in the data, misclassifications occur even on the
data that is used to train the classifier.
Machine learning
Now that we have some idea about the inputs and outputs, let’s turn to machine
learning. What is learning, anyway? What is machine learning? These are philo-
sophic questions, and we will not be much concerned with philosophy in this
book; our emphasis is firmly on the practical. However, it is worth spending a
few moments at the outset on fundamental issues, just to see how tricky they
are, before rolling up our sleeves and looking at machine learning in practice.
Our dictionary defines “to learn” as follows:
To get knowledge of by study, experience, or being taught;
To become aware by information or from observation;
To commit to memory;
To be informed of, ascertain;
To receive instruction.
1.1 DATA MINING AND MACHINE LEARNING 7
These meanings have some shortcomings when it comes to talking about com-
puters. For the first two, it is virtually impossible to test whether learning has
been achieved or not. How do you know whether a machine has got knowledge
of something? You probably can’t just ask it questions; even if you could, you
wouldn’t be testing its ability to learn but would be testing its ability to answer
questions. How do you know whether it has become aware of something? The
whole question of whether computers can be aware, or conscious, is a burning
philosophic issue. As for the last three meanings, although we can see what they
denote in human terms, merely “committing to memory” and “receiving
instruction” seem to fall far short of what we might mean by machine learning.
They are too passive, and we know that computers find these tasks trivial.
Instead, we are interested in improvements in performance, or at least in the
potential for performance, in new situations. You can “commit something to
memory” or “be informed of something” by rote learning without being able to
apply the new knowledge to new situations. You can receive instruction without
benefiting from it at all.
Earlier we defined data mining operationally as the process of discovering
patterns, automatically or semiautomatically, in large quantities of data—and
the patterns must be useful. An operational definition can be formulated in the
same way for learning:
Things learn when they change their behavior in a way that makes them
perform better in the future.
This ties learning to performance rather than knowledge. You can test learning
by observing the behavior and comparing it with past behavior. This is a much
more objective kind of definition and appears to be far more satisfactory.
But there’s still a problem. Learning is a rather slippery concept. Lots of things
change their behavior in ways that make them perform better in the future, yet
we wouldn’t want to say that they have actually learned. A good example is a
comfortable slipper. Has it learned the shape of your foot? It has certainly
changed its behavior to make it perform better as a slipper! Yet we would hardly
want to call this learning. In everyday language, we often use the word “train-
ing” to denote a mindless kind of learning. We train animals and even plants,
although it would be stretching the word a bit to talk of training objects such
as slippers that are not in any sense alive. But learning is different. Learning
implies thinking. Learning implies purpose. Something that learns has to do so
intentionally. That is why we wouldn’t say that a vine has learned to grow round
a trellis in a vineyard—we’d say it has been trained. Learning without purpose
is merely training. Or, more to the point, in learning the purpose is the learner’s,
whereas in training it is the teacher’s.
Thus on closer examination the second definition of learning, in operational,
performance-oriented terms,has its own problems when it comes to talking about
8 CHAPTER 1 | WHAT’S IT ALL ABOUT?
computers. To decide whether something has actually learned, you need to see
whether it intended to or whether there was any purpose involved. That makes
the concept moot when applied to machines because whether artifacts can behave
purposefully is unclear. Philosophic discussions of what is really meant by “learn-
ing,” like discussions of what is really meant by “intention” or “purpose,” are
fraught with difficulty. Even courts of law find intention hard to grapple with.
Data mining
Fortunately, the kind of learning techniques explained in this book do not
present these conceptual problems—they are called machine learning without
really presupposing any particular philosophic stance about what learning actu-
ally is. Data mining is a practical topic and involves learning in a practical, not
a theoretical, sense. We are interested in techniques for finding and describing
structural patterns in data as a tool for helping to explain that data and make
predictions from it. The data will take the form of a set of examples—examples
of customers who have switched loyalties, for instance, or situations in which
certain kinds of contact lenses can be prescribed. The output takes the form of
predictions about new examples—a prediction of whether a particular customer
will switch or a prediction of what kind of lens will be prescribed under given
circumstances. But because this book is about finding and describing patterns
in data, the output may also include an actual description of a structure that
can be used to classify unknown examples to explain the decision. As well as
performance, it is helpful to supply an explicit representation of the knowledge
that is acquired. In essence, this reflects both definitions of learning considered
previously: the acquisition of knowledge and the ability to use it.
Many learning techniques look for structural descriptions of what is learned,
descriptions that can become fairly complex and are typically expressed as sets
of rules such as the ones described previously or the decision trees described
later in this chapter. Because they can be understood by people, these descrip-
tions serve to explain what has been learned and explain the basis for new pre-
dictions. Experience shows that in many applications of machine learning to
data mining, the explicit knowledge structures that are acquired, the structural
descriptions, are at least as important, and often very much more important,
than the ability to perform well on new examples. People frequently use data
mining to gain knowledge, not just predictions. Gaining knowledge from data
certainly sounds like a good idea if you can do it. To find out how, read on!
1.2 Simple examples: The weather problem and others
We use a lot of examples in this book, which seems particularly appropriate con-
sidering that the book is all about learning from examples! There are several
1.2 SIMPLE EXAMPLES: THE WEATHER PROBLEM AND OTHERS 9
standard datasets that we will come back to repeatedly. Different datasets tend
to expose new issues and challenges, and it is interesting and instructive to have
in mind a variety of problems when considering learning methods. In fact, the
need to work with different datasets is so important that a corpus containing
around 100 example problems has been gathered together so that different algo-
rithms can be tested and compared on the same set of problems.
The illustrations in this section are all unrealistically simple. Serious appli-
cation of data mining involves thousands, hundreds of thousands, or even mil-
lions of individual cases. But when explaining what algorithms do and how they
work, we need simple examples that capture the essence of the problem but are
small enough to be comprehensible in every detail. We will be working with the
illustrations in this section throughout the book, and they are intended to be
“academic” in the sense that they will help us to understand what is going on.
Some actual fielded applications of learning techniques are discussed in Section
1.3, and many more are covered in the books mentioned in the Further reading
section at the end of the chapter.
Another problem with actual real-life datasets is that they are often propri-
etary. No one is going to share their customer and product choice database with
you so that you can understand the details of their data mining application and
how it works. Corporate data is a valuable asset, one whose value has increased
enormously with the development of data mining techniques such as those
described in this book. Yet we are concerned here with understanding how the
methods used for data mining work and understanding the details of these
methods so that we can trace their operation on actual data. That is why our
illustrations are simple ones. But they are not simplistic: they exhibit the fea-
tures of real datasets.
The weather problem
The weather problem is a tiny dataset that we will use repeatedly to illustrate
machine learning methods. Entirely fictitious, it supposedly concerns the con-
ditions that are suitable for playing some unspecified game. In general, instances
in a dataset are characterized by the values of features, or attributes, that measure
different aspects of the instance. In this case there are four attributes: outlook,
temperature, humidity, and windy. The outcome is whether to play or not.
In its simplest form, shown in Table 1.2, all four attributes have values that
are symbolic categories rather than numbers. Outlook can be sunny, overcast, or
rainy; temperature can be hot, mild, or cool; humidity can be high or normal;
and windy can be true or false. This creates 36 possible combinations (3 ¥ 3 ¥
2 ¥ 2 = 36), of which 14 are present in the set of input examples.
A set of rules learned from this information—not necessarily a very good
one—might look as follows:
10 CHAPTER 1 | WHAT’S IT ALL ABOUT?
If outlook = sunny and humidity = high then play = no
If outlook = rainy and windy = true then play = no
If outlook = overcast then play = yes
If humidity = normal then play = yes
If none of the above then play = yes
These rules are meant to be interpreted in order: the first one, then if it doesn’t
apply the second, and so on. A set of rules that are intended to be interpreted
in sequence is called a decision list. Interpreted as a decision list, the rules
correctly classify all of the examples in the table, whereas taken individually, out
of context, some of the rules are incorrect. For example, the rule if humidity =
normal then play = yes gets one of the examples wrong (check which one).
The meaning of a set of rules depends on how it is interpreted—not
surprisingly!
In the slightly more complex form shown in Table 1.3, two of the attributes—
temperature and humidity—have numeric values. This means that any learn-
ing method must create inequalities involving these attributes rather than
simple equality tests, as in the former case. This is called a numeric-attribute
problem—in this case, a mixed-attribute problem because not all attributes are
numeric.
Now the first rule given earlier might take the following form:
If outlook = sunny and humidity > 83 then play = no
A slightly more complex process is required to come up with rules that involve
numeric tests.
1.2 SIMPLE EXAMPLES: THE WEATHER PROBLEM AND OTHERS 11
Table 1.2 The weather data.
Outlook Temperature Humidity Windy Play
sunny hot high false no
sunny hot high true no
overcast hot high false yes
rainy mild high false yes
rainy cool normal false yes
rainy cool normal true no
overcast cool normal true yes
sunny mild high false no
sunny cool normal false yes
rainy mild normal false yes
sunny mild normal true yes
overcast mild high true yes
overcast hot normal false yes
rainy mild high true no
The rules we have seen so far are classification rules: they predict the classifi-
cation of the example in terms of whether to play or not. It is equally possible
to disregard the classification and just look for any rules that strongly associate
different attribute values. These are called association rules. Many association
rules can be derived from the weather data in Table 1.2. Some good ones are as
follows:
If temperature = cool then humidity = normal
If humidity = normal and windy = false then play = yes
If outlook = sunny and play = no then humidity = high
If windy = false and play = no then outlook = sunny
and humidity = high.
All these rules are 100% correct on the given data; they make no false predic-
tions. The first two apply to four examples in the dataset, the third to three
examples, and the fourth to two examples. There are many other rules: in fact,
nearly 60 association rules can be found that apply to two or more examples of
the weather data and are completely correct on this data. If you look for rules
that are less than 100% correct, then you will find many more. There are so
many because unlike classification rules, association rules can “predict” any of
the attributes, not just a specified class, and can even predict more than one
thing. For example, the fourth rule predicts both that outlook will be sunny and
that humidity will be high.
12 CHAPTER 1 | WHAT’S IT ALL ABOUT?
Table 1.3 Weather data with some numeric attributes.
Outlook Temperature Humidity Windy Play
sunny 85 85 false no
sunny 80 90 true no
overcast 83 86 false yes
rainy 70 96 false yes
rainy 68 80 false yes
rainy 65 70 true no
overcast 64 65 true yes
sunny 72 95 false no
sunny 69 70 false yes
rainy 75 80 false yes
sunny 75 70 true yes
overcast 72 90 true yes
overcast 81 75 false yes
rainy 71 91 true no
Contact lenses: An idealized problem
The contact lens data introduced earlier tells you the kind of contact lens to pre-
scribe, given certain information about a patient. Note that this example is
intended for illustration only: it grossly oversimplifies the problem and should
certainly not be used for diagnostic purposes!
The first column of Table 1.1 gives the age of the patient. In case you’re won-
dering, presbyopia is a form of longsightedness that accompanies the onset of
middle age. The second gives the spectacle prescription: myope means short-
sighted and hypermetrope means longsighted. The third shows whether the
patient is astigmatic, and the fourth relates to the rate of tear production, which
is important in this context because tears lubricate contact lenses. The final
column shows which kind of lenses to prescribe: hard, soft, or none. All possi-
ble combinations of the attribute values are represented in the table.
A sample set of rules learned from this information is shown in Figure 1.1.
This is a rather large set of rules, but they do correctly classify all the examples.
These rules are complete and deterministic: they give a unique prescription for
every conceivable example. Generally, this is not the case. Sometimes there are
situations in which no rule applies; other times more than one rule may apply,
resulting in conflicting recommendations. Sometimes probabilities or weights
1.2 SIMPLE EXAMPLES: THE WEATHER PROBLEM AND OTHERS 13
If tear production rate = reduced then recommendation = none
If age = young and astigmatic = no and
tear production rate = normal then recommendation = soft
If age = pre-presbyopic and astigmatic = no and
tear production rate = normal then recommendation = soft
If age = presbyopic and spectacle prescription = myope and
astigmatic = no then recommendation = none
If spectacle prescription = hypermetrope and astigmatic = no and
tear production rate = normal then recommendation = soft
If spectacle prescription = myope and astigmatic = yes and
tear production rate = normal then recommendation = hard
If age = young and astigmatic = yes and
tear production rate = normal then recommendation = hard
If age = pre-presbyopic and
spectacle prescription = hypermetrope and astigmatic = yes
then recommendation = none
If age = presbyopic and spectacle prescription = hypermetrope
and astigmatic = yes then recommendation = none
Figure 1.1 Rules for the contact lens data.
may be associated with the rules themselves to indicate that some are more
important, or more reliable, than others.
You might be wondering whether there is a smaller rule set that performs as
well. If so, would you be better off using the smaller rule set and, if so, why?
These are exactly the kinds of questions that will occupy us in this book. Because
the examples form a complete set for the problem space, the rules do no more
than summarize all the information that is given, expressing it in a different and
more concise way. Even though it involves no generalization, this is often a very
useful thing to do! People frequently use machine learning techniques to gain
insight into the structure of their data rather than to make predictions for new
cases. In fact, a prominent and successful line of research in machine learning
began as an attempt to compress a huge database of possible chess endgames
and their outcomes into a data structure of reasonable size. The data structure
chosen for this enterprise was not a set of rules but a decision tree.
Figure 1.2 shows a structural description for the contact lens data in the form
of a decision tree, which for many purposes is a more concise and perspicuous
representation of the rules and has the advantage that it can be visualized more
easily. (However, this decision tree—in contrast to the rule set given in Figure
1.1—classifies two examples incorrectly.) The tree calls first for a test on tear
production rate, and the first two branches correspond to the two possible out-
comes. If tear production rate is reduced (the left branch), the outcome is none.
If it is normal (the right branch), a second test is made, this time on astigma-
tism. Eventually, whatever the outcome of the tests, a leaf of the tree is reached
14 CHAPTER 1 | WHAT’S IT ALL ABOUT?
normal
tear production rate
reduced
hypermetrope
myope
none astigmatism
soft
hard none
spectacle prescription
yes
no
Figure 1.2 Decision tree for the
contact lens data.
that dictates the contact lens recommendation for that case. The question of
what is the most natural and easily understood format for the output from a
machine learning scheme is one that we will return to in Chapter 3.
Irises: A classic numeric dataset
The iris dataset, which dates back to seminal work by the eminent statistician
R.A. Fisher in the mid-1930s and is arguably the most famous dataset used in
data mining, contains 50 examples each of three types of plant: Iris setosa, Iris
versicolor, and Iris virginica. It is excerpted in Table 1.4. There are four attrib-
utes: sepal length, sepal width, petal length, and petal width (all measured in cen-
timeters). Unlike previous datasets, all attributes have values that are numeric.
The following set of rules might be learned from this dataset:
If petal length < 2.45 then Iris setosa
If sepal width < 2.10 then Iris versicolor
If sepal width < 2.45 and petal length < 4.55 then Iris versicolor
If sepal width < 2.95 and petal width < 1.35 then Iris versicolor
If petal length ≥ 2.45 and petal length < 4.45 then Iris versicolor
If sepal length ≥ 5.85 and petal length < 4.75 then Iris versicolor
1.2 SIMPLE EXAMPLES: THE WEATHER PROBLEM AND OTHERS 15
Table 1.4 The iris data.
Sepal Sepal width Petal length Petal width
length (cm) (cm) (cm) (cm) Type
1 5.1 3.5 1.4 0.2 Iris setosa
2 4.9 3.0 1.4 0.2 Iris setosa
3 4.7 3.2 1.3 0.2 Iris setosa
4 4.6 3.1 1.5 0.2 Iris setosa
5 5.0 3.6 1.4 0.2 Iris setosa
. . .
51 7.0 3.2 4.7 1.4 Iris versicolor
52 6.4 3.2 4.5 1.5 Iris versicolor
53 6.9 3.1 4.9 1.5 Iris versicolor
54 5.5 2.3 4.0 1.3 Iris versicolor
55 6.5 2.8 4.6 1.5 Iris versicolor
. . .
101 6.3 3.3 6.0 2.5 Iris virginica
102 5.8 2.7 5.1 1.9 Iris virginica
103 7.1 3.0 5.9 2.1 Iris virginica
104 6.3 2.9 5.6 1.8 Iris virginica
105 6.5 3.0 5.8 2.2 Iris virginica
. . .
If sepal width < 2.55 and petal length < 4.95 and
petal width < 1.55 then Iris versicolor
If petal length ≥ 2.45 and petal length < 4.95 and
petal width < 1.55 then Iris versicolor
If sepal length ≥ 6.55 and petal length < 5.05 then Iris versicolor
If sepal width < 2.75 and petal width < 1.65 and
sepal length < 6.05 then Iris versicolor
If sepal length ≥ 5.85 and sepal length < 5.95 and
petal length < 4.85 then Iris versicolor
If petal length ≥ 5.15 then Iris virginica
If petal width ≥ 1.85 then Iris virginica
If petal width ≥ 1.75 and sepal width < 3.05 then Iris virginica
If petal length ≥ 4.95 and petal width < 1.55 then Iris virginica
These rules are very cumbersome, and we will see in Chapter 3 how more
compact rules can be expressed that convey the same information.
CPU performance: Introducing numeric prediction
Although the iris dataset involves numeric attributes, the outcome—the type of
iris—is a category, not a numeric value. Table 1.5 shows some data for which
the outcome and the attributes are numeric. It concerns the relative perform-
ance of computer processing power on the basis of a number of relevant
attributes; each row represents 1 of 209 different computer configurations.
The classic way of dealing with continuous prediction is to write the outcome
as a linear sum of the attribute values with appropriate weights, for example:
16 CHAPTER 1 | WHAT’S IT ALL ABOUT?
Table 1.5 The CPU performance data.
Main
Cycle
memory (KB)
Cache
Channels
time (ns) Min. Max. (KB) Min. Max. Performance
MYCT MMIN MMAX CACH CHMIN CHMAX PRP
1 125 256 6000 256 16 128 198
2 29 8000 32000 32 8 32 269
3 29 8000 32000 32 8 32 220
4 29 8000 32000 32 8 32 172
5 29 8000 16000 32 8 16 132
. . .
207 125 2000 8000 0 2 14 52
208 480 512 8000 32 0 0 67
209 480 1000 4000 0 0 0 45
PRP = -55.9 + 0.0489 MYCT + 0.0153 MMIN + 0.0056 MMAX
+ 0.6410 CACH - 0.2700 CHMIN + 1.480 CHMAX.
(The abbreviated variable names are given in the second row of the table.) This
is called a regression equation, and the process of determining the weights is
called regression, a well-known procedure in statistics that we will review in
Chapter 4. However, the basic regression method is incapable of discovering
nonlinear relationships (although variants do exist—indeed, one will be
described in Section 6.3), and in Chapter 3 we will examine different represen-
tations that can be used for predicting numeric quantities.
In the iris and central processing unit (CPU) performance data, all the
attributes have numeric values. Practical situations frequently present a mixture
of numeric and nonnumeric attributes.
Labor negotiations: A more realistic example
The labor negotiations dataset in Table 1.6 summarizes the outcome of Cana-
dian contract negotiations in 1987 and 1988. It includes all collective agreements
reached in the business and personal services sector for organizations with at
least 500 members (teachers, nurses, university staff, police, etc.). Each case con-
cerns one contract, and the outcome is whether the contract is deemed accept-
able or unacceptable. The acceptable contracts are ones in which agreements
were accepted by both labor and management. The unacceptable ones are either
known offers that fell through because one party would not accept them or
acceptable contracts that had been significantly perturbed to the extent that, in
the view of experts, they would not have been accepted.
There are 40 examples in the dataset (plus another 17 which are normally
reserved for test purposes). Unlike the other tables here, Table 1.6 presents the
examples as columns rather than as rows; otherwise, it would have to be
stretched over several pages. Many of the values are unknown or missing, as
indicated by question marks.
This is a much more realistic dataset than the others we have seen. It con-
tains many missing values, and it seems unlikely that an exact classification can
be obtained.
Figure 1.3 shows two decision trees that represent the dataset. Figure 1.3(a)
is simple and approximate: it doesn’t represent the data exactly. For example, it
will predict bad for some contracts that are actually marked good. But it does
make intuitive sense: a contract is bad (for the employee!) if the wage increase
in the first year is too small (less than 2.5%). If the first-year wage increase is
larger than this, it is good if there are lots of statutory holidays (more than 10
days). Even if there are fewer statutory holidays, it is good if the first-year wage
increase is large enough (more than 4%).
1.2 SIMPLE EXAMPLES: THE WEATHER PROBLEM AND OTHERS 17
Figure 1.3(b) is a more complex decision tree that represents the same
dataset. In fact, this is a more accurate representation of the actual dataset that
was used to create the tree. But it is not necessarily a more accurate representa-
tion of the underlying concept of good versus bad contracts. Look down the left
branch. It doesn’t seem to make sense intuitively that, if the working hours
exceed 36, a contract is bad if there is no health-plan contribution or a full
health-plan contribution but is good if there is a half health-plan contribution.
It is certainly reasonable that the health-plan contribution plays a role in the
decision but not if half is good and both full and none are bad. It seems likely
that this is an artifact of the particular values used to create the decision tree
rather than a genuine feature of the good versus bad distinction.
The tree in Figure 1.3(b) is more accurate on the data that was used to train
the classifier but will probably perform less well on an independent set of test
data. It is “overfitted” to the training data—it follows it too slavishly. The tree
in Figure 1.3(a) is obtained from the one in Figure 1.3(b) by a process of
pruning, which we will learn more about in Chapter 6.
Soybean classification: A classic machine learning success
An often-quoted early success story in the application of machine learning to
practical problems is the identification of rules for diagnosing soybean diseases.
The data is taken from questionnaires describing plant diseases. There are about
18 CHAPTER 1 | WHAT’S IT ALL ABOUT?
Table 1.6 The labor negotiations data.
Attribute Type 1 2 3 . . . 40
duration years 1 2 3 2
wage increase 1st year percentage 2% 4% 4.3% 4.5
wage increase 2nd year percentage ? 5% 4.4% 4.0
wage increase 3rd year percentage ? ? ? ?
cost of living adjustment {none, tcf, tc} none tcf ? none
working hours per week hours 28 35 38 40
pension {none, ret-allw, empl-cntr} none ? ? ?
standby pay percentage ? 13% ? ?
shift-work supplement percentage ? 5% 4% 4
education allowance {yes, no} yes ? ? ?
statutory holidays days 11 15 12 12
vacation {below-avg, avg, gen} avg gen gen avg
long-term disability assistance {yes, no} no ? ? yes
dental plan contribution {none, half, full} none ? full full
bereavement assistance {yes, no} no ? ? yes
health plan contribution {none, half, full} none ? full half
acceptability of contract {good, bad} bad good good good
1.2 SIMPLE EXAMPLES: THE WEATHER PROBLEM AND OTHERS 19
wage
increase
first
year
bad
≤
2.5
statutory
holidays
>
2.5
good
>
10
wage
increase
first
year
≤
10
bad
≤
4
good
>
4
(a)
≤
2.5
statutory
holidays
>
2.5
bad
≤
36
health
plan
contribution
>
36
good
>
10
wage
increase
first
year
wage
increase
first
year
working
hours
per
week
≤
10
bad
none
good
half
bad
full
bad
≤
4
good
>
4
(b)
Figure
1.3
Decision
trees
for
the
labor
negotiations
data.
680 examples, each representing a diseased plant. Plants were measured on 35
attributes, each one having a small set of possible values. Examples are labeled
with the diagnosis of an expert in plant biology: there are 19 disease categories
altogether—horrible-sounding diseases such as diaporthe stem canker, rhizoc-
tonia root rot, and bacterial blight, to mention just a few.
Table 1.7 gives the attributes, the number of different values that each can
have, and a sample record for one particular plant. The attributes are placed into
different categories just to make them easier to read.
Here are two example rules, learned from this data:
If [leaf condition is normal and
stem condition is abnormal and
stem cankers is below soil line and
canker lesion color is brown]
then
diagnosis is rhizoctonia root rot
If [leaf malformation is absent and
stem condition is abnormal and
stem cankers is below soil line and
canker lesion color is brown]
then
diagnosis is rhizoctonia root rot
These rules nicely illustrate the potential role of prior knowledge—often called
domain knowledge—in machine learning, because the only difference between
the two descriptions is leaf condition is normal versus leaf malformation is
absent. Now, in this domain, if the leaf condition is normal then leaf malfor-
mation is necessarily absent, so one of these conditions happens to be a special
case of the other. Thus if the first rule is true, the second is necessarily true as
well. The only time the second rule comes into play is when leaf malformation
is absent but leaf condition is not normal, that is, when something other than
malformation is wrong with the leaf. This is certainly not apparent from a casual
reading of the rules.
Research on this problem in the late 1970s found that these diagnostic rules
could be generated by a machine learning algorithm, along with rules for every
other disease category, from about 300 training examples. These training
examples were carefully selected from the corpus of cases as being quite differ-
ent from one another—“far apart” in the example space. At the same time, the
plant pathologist who had produced the diagnoses was interviewed, and his
expertise was translated into diagnostic rules. Surprisingly, the computer-
generated rules outperformed the expert-derived rules on the remaining test
examples. They gave the correct disease top ranking 97.5% of the time com-
pared with only 72% for the expert-derived rules. Furthermore, not only did
20 CHAPTER 1 | WHAT’S IT ALL ABOUT?
1.2 SIMPLE EXAMPLES: THE WEATHER PROBLEM AND OTHERS 21
Table 1.7 The soybean data.
Number
Attribute of values Sample value
Environment time of occurrence 7 July
precipitation 3 above normal
temperature 3 normal
cropping history 4 same as last year
hail damage 2 yes
damaged area 4 scattered
severity 3 severe
plant height 2 normal
plant growth 2 abnormal
seed treatment 3 fungicide
germination 3 less than 80%
Seed condition 2 normal
mold growth 2 absent
discoloration 2 absent
size 2 normal
shriveling 2 absent
Fruit condition of fruit pods 3 normal
fruit spots 5 —
Leaf condition 2 abnormal
leaf spot size 3 —
yellow leaf spot halo 3 absent
leaf spot margins 3 —
shredding 2 absent
leaf malformation 2 absent
leaf mildew growth 3 absent
Stem condition 2 abnormal
stem lodging 2 yes
stem cankers 4 above soil line
canker lesion color 3 —
fruiting bodies on stems 2 present
external decay of stem 3 firm and dry
mycelium on stem 2 absent
internal discoloration 3 none
sclerotia 2 absent
Root condition 3 normal
Diagnosis diaporthe stem
19 canker
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf
Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf

More Related Content

What's hot

Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
PremNaraindas1
 
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Edureka!
 
Artificial intelligence in practice- part-1
Artificial intelligence in practice- part-1Artificial intelligence in practice- part-1
Artificial intelligence in practice- part-1
GMR Group
 
Machine learning basics
Machine learning basics Machine learning basics
Machine learning basics
Akanksha Bali
 
Machine learning Presentation
Machine learning PresentationMachine learning Presentation
Machine learning Presentation
Manish Singh
 
Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overview
prih_yah
 
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
Edureka!
 
A workshop on 'AI transforming Business’
A workshop  on 'AI transforming Business’A workshop  on 'AI transforming Business’
A workshop on 'AI transforming Business’
Rubixe
 
The State of Artificial Intelligence in 2018: A Good Old Fashioned Report
The State of Artificial Intelligence in 2018: A Good Old Fashioned ReportThe State of Artificial Intelligence in 2018: A Good Old Fashioned Report
The State of Artificial Intelligence in 2018: A Good Old Fashioned Report
Nathan Benaich
 
Artificial Intelligence for Product Managers by former Yahoo! PM
Artificial Intelligence for Product Managers by former Yahoo! PMArtificial Intelligence for Product Managers by former Yahoo! PM
Artificial Intelligence for Product Managers by former Yahoo! PM
Product School
 
Stanford AI Report 2023
Stanford AI Report 2023Stanford AI Report 2023
Stanford AI Report 2023
Kapil Khandelwal (KK)
 
Machine learning
Machine learningMachine learning
Machine learning
Tushar Nikam
 
A Framework for Navigating Generative Artificial Intelligence for Enterprise
A Framework for Navigating Generative Artificial Intelligence for EnterpriseA Framework for Navigating Generative Artificial Intelligence for Enterprise
A Framework for Navigating Generative Artificial Intelligence for Enterprise
RocketSource
 
Impact of Artificial Intelligence in IT Industry
Impact of Artificial Intelligence in IT IndustryImpact of Artificial Intelligence in IT Industry
Impact of Artificial Intelligence in IT Industry
Anand SFJ
 
Prompt Engineering Guide.pptx
Prompt Engineering Guide.pptxPrompt Engineering Guide.pptx
Prompt Engineering Guide.pptx
AmitSherewat
 
Introduction to Machine Learning
Introduction to Machine Learning   Introduction to Machine Learning
Introduction to Machine Learning
snehal_152
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Shrey Malik
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Sebastian Raschka
 
Case study on machine learning
Case study on machine learningCase study on machine learning
Case study on machine learning
HarshitBarde
 
CHATGPT.pptx
CHATGPT.pptxCHATGPT.pptx
CHATGPT.pptx
SajedRahman2
 

What's hot (20)

Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...
 
Artificial intelligence in practice- part-1
Artificial intelligence in practice- part-1Artificial intelligence in practice- part-1
Artificial intelligence in practice- part-1
 
Machine learning basics
Machine learning basics Machine learning basics
Machine learning basics
 
Machine learning Presentation
Machine learning PresentationMachine learning Presentation
Machine learning Presentation
 
Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overview
 
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
 
A workshop on 'AI transforming Business’
A workshop  on 'AI transforming Business’A workshop  on 'AI transforming Business’
A workshop on 'AI transforming Business’
 
The State of Artificial Intelligence in 2018: A Good Old Fashioned Report
The State of Artificial Intelligence in 2018: A Good Old Fashioned ReportThe State of Artificial Intelligence in 2018: A Good Old Fashioned Report
The State of Artificial Intelligence in 2018: A Good Old Fashioned Report
 
Artificial Intelligence for Product Managers by former Yahoo! PM
Artificial Intelligence for Product Managers by former Yahoo! PMArtificial Intelligence for Product Managers by former Yahoo! PM
Artificial Intelligence for Product Managers by former Yahoo! PM
 
Stanford AI Report 2023
Stanford AI Report 2023Stanford AI Report 2023
Stanford AI Report 2023
 
Machine learning
Machine learningMachine learning
Machine learning
 
A Framework for Navigating Generative Artificial Intelligence for Enterprise
A Framework for Navigating Generative Artificial Intelligence for EnterpriseA Framework for Navigating Generative Artificial Intelligence for Enterprise
A Framework for Navigating Generative Artificial Intelligence for Enterprise
 
Impact of Artificial Intelligence in IT Industry
Impact of Artificial Intelligence in IT IndustryImpact of Artificial Intelligence in IT Industry
Impact of Artificial Intelligence in IT Industry
 
Prompt Engineering Guide.pptx
Prompt Engineering Guide.pptxPrompt Engineering Guide.pptx
Prompt Engineering Guide.pptx
 
Introduction to Machine Learning
Introduction to Machine Learning   Introduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Case study on machine learning
Case study on machine learningCase study on machine learning
Case study on machine learning
 
CHATGPT.pptx
CHATGPT.pptxCHATGPT.pptx
CHATGPT.pptx
 

Similar to Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf

Data_Warehousing_and_Data_Minng_Text_Book.pdf
Data_Warehousing_and_Data_Minng_Text_Book.pdfData_Warehousing_and_Data_Minng_Text_Book.pdf
Data_Warehousing_and_Data_Minng_Text_Book.pdf
ProfPPavanKumar
 
The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Ka...
The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Ka...The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Ka...
The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Ka...
DrGnaneswariG
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
AbcdDcba12
 
dagrep_v006_i004_p057_s16152
dagrep_v006_i004_p057_s16152dagrep_v006_i004_p057_s16152
dagrep_v006_i004_p057_s16152
Lenore Mullin
 
Data science syllabus
Data science syllabusData science syllabus
Data science syllabus
anoop bk
 
cs690l-syl.doc.doc
cs690l-syl.doc.doccs690l-syl.doc.doc
cs690l-syl.doc.doc
butest
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
butest
 
Ug 3 1 r19 cse syllabus
Ug 3 1 r19 cse syllabusUg 3 1 r19 cse syllabus
Ug 3 1 r19 cse syllabus
SubbuBuddu
 
Sylabbi 2012
Sylabbi 2012Sylabbi 2012
Sylabbi 2012
Ashish Ratan
 
SMART Seminar Series: "From Big Data to Smart data"
SMART Seminar Series: "From Big Data to Smart data"SMART Seminar Series: "From Big Data to Smart data"
SMART Seminar Series: "From Big Data to Smart data"
SMART Infrastructure Facility
 
Proposed-curricula-MCSEwithSyllabus_24_...
Proposed-curricula-MCSEwithSyllabus_24_...Proposed-curricula-MCSEwithSyllabus_24_...
Proposed-curricula-MCSEwithSyllabus_24_...
butest
 
Data Science versus Artificial Intelligence: a useful distinction
Data Science versus Artificial Intelligence: a useful distinctionData Science versus Artificial Intelligence: a useful distinction
Data Science versus Artificial Intelligence: a useful distinction
Christoforos Anagnostopoulos
 
SoftwareInformationTechnology
SoftwareInformationTechnologySoftwareInformationTechnology
SoftwareInformationTechnology
Salhi Fadhel
 
A modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringA modified k means algorithm for big data clustering
A modified k means algorithm for big data clustering
SK Ahammad Fahad
 
Syllabus
SyllabusSyllabus
Syllabus
guptamani123
 
Big Data meets Big Social: Social Machines and the Semantic Web
Big Data meets Big Social: Social Machines and the Semantic WebBig Data meets Big Social: Social Machines and the Semantic Web
Big Data meets Big Social: Social Machines and the Semantic Web
David De Roure
 
10 problems 06
10 problems 0610 problems 06
10 problems 06
Loc Nguyễn
 
Data Mining introduction and basic concepts
Data Mining introduction and basic conceptsData Mining introduction and basic concepts
Data Mining introduction and basic concepts
PritiRishi
 
Semester V-converted.pdf
Semester V-converted.pdfSemester V-converted.pdf
Semester V-converted.pdf
GayathriRHICETCSESTA
 
data mining
data miningdata mining
data mining
uoitc
 

Similar to Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf (20)

Data_Warehousing_and_Data_Minng_Text_Book.pdf
Data_Warehousing_and_Data_Minng_Text_Book.pdfData_Warehousing_and_Data_Minng_Text_Book.pdf
Data_Warehousing_and_Data_Minng_Text_Book.pdf
 
The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Ka...
The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Ka...The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Ka...
The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Ka...
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
dagrep_v006_i004_p057_s16152
dagrep_v006_i004_p057_s16152dagrep_v006_i004_p057_s16152
dagrep_v006_i004_p057_s16152
 
Data science syllabus
Data science syllabusData science syllabus
Data science syllabus
 
cs690l-syl.doc.doc
cs690l-syl.doc.doccs690l-syl.doc.doc
cs690l-syl.doc.doc
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Ug 3 1 r19 cse syllabus
Ug 3 1 r19 cse syllabusUg 3 1 r19 cse syllabus
Ug 3 1 r19 cse syllabus
 
Sylabbi 2012
Sylabbi 2012Sylabbi 2012
Sylabbi 2012
 
SMART Seminar Series: "From Big Data to Smart data"
SMART Seminar Series: "From Big Data to Smart data"SMART Seminar Series: "From Big Data to Smart data"
SMART Seminar Series: "From Big Data to Smart data"
 
Proposed-curricula-MCSEwithSyllabus_24_...
Proposed-curricula-MCSEwithSyllabus_24_...Proposed-curricula-MCSEwithSyllabus_24_...
Proposed-curricula-MCSEwithSyllabus_24_...
 
Data Science versus Artificial Intelligence: a useful distinction
Data Science versus Artificial Intelligence: a useful distinctionData Science versus Artificial Intelligence: a useful distinction
Data Science versus Artificial Intelligence: a useful distinction
 
SoftwareInformationTechnology
SoftwareInformationTechnologySoftwareInformationTechnology
SoftwareInformationTechnology
 
A modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringA modified k means algorithm for big data clustering
A modified k means algorithm for big data clustering
 
Syllabus
SyllabusSyllabus
Syllabus
 
Big Data meets Big Social: Social Machines and the Semantic Web
Big Data meets Big Social: Social Machines and the Semantic WebBig Data meets Big Social: Social Machines and the Semantic Web
Big Data meets Big Social: Social Machines and the Semantic Web
 
10 problems 06
10 problems 0610 problems 06
10 problems 06
 
Data Mining introduction and basic concepts
Data Mining introduction and basic conceptsData Mining introduction and basic concepts
Data Mining introduction and basic concepts
 
Semester V-converted.pdf
Semester V-converted.pdfSemester V-converted.pdf
Semester V-converted.pdf
 
data mining
data miningdata mining
data mining
 

More from Saqib Raza

The Design and Analysis of Algorithms.pdf
The Design and Analysis of Algorithms.pdfThe Design and Analysis of Algorithms.pdf
The Design and Analysis of Algorithms.pdf
Saqib Raza
 
An Introduction to the Analysis of Algorithms (2nd_Edition_Robert_Sedgewick,_...
An Introduction to the Analysis of Algorithms (2nd_Edition_Robert_Sedgewick,_...An Introduction to the Analysis of Algorithms (2nd_Edition_Robert_Sedgewick,_...
An Introduction to the Analysis of Algorithms (2nd_Edition_Robert_Sedgewick,_...
Saqib Raza
 
Social Impacts of Artificial intelligence
Social Impacts of Artificial intelligenceSocial Impacts of Artificial intelligence
Social Impacts of Artificial intelligence
Saqib Raza
 
Professional Practice Course Outline
Professional Practice Course OutlineProfessional Practice Course Outline
Professional Practice Course Outline
Saqib Raza
 
Ethical hacking
Ethical hackingEthical hacking
Ethical hacking
Saqib Raza
 
12 security policies
12 security policies12 security policies
12 security policies
Saqib Raza
 
11 Computer Privacy
11 Computer Privacy11 Computer Privacy
11 Computer Privacy
Saqib Raza
 
Software Engineering Code Of Ethics And Professional Practice
Software Engineering Code Of Ethics And Professional PracticeSoftware Engineering Code Of Ethics And Professional Practice
Software Engineering Code Of Ethics And Professional Practice
Saqib Raza
 
7 Engineering Profession
7 Engineering Profession7 Engineering Profession
7 Engineering Profession
Saqib Raza
 
6 software contracts
6 software contracts6 software contracts
6 software contracts
Saqib Raza
 
Introduction to Intellectual Property
Introduction to Intellectual PropertyIntroduction to Intellectual Property
Introduction to Intellectual Property
Saqib Raza
 
Itroduction to Business Ethics
Itroduction to Business EthicsItroduction to Business Ethics
Itroduction to Business Ethics
Saqib Raza
 
Types of Ethics
Types of EthicsTypes of Ethics
Types of Ethics
Saqib Raza
 
Introduction to ethics
Introduction to ethicsIntroduction to ethics
Introduction to ethics
Saqib Raza
 
Project Management Concepts
Project Management ConceptsProject Management Concepts
Project Management Concepts
Saqib Raza
 
Software Re-Engineering
Software Re-EngineeringSoftware Re-Engineering
Software Re-Engineering
Saqib Raza
 
Software Quality Assurance
Software Quality AssuranceSoftware Quality Assurance
Software Quality Assurance
Saqib Raza
 
User Interface Analysis and Design
User Interface Analysis and DesignUser Interface Analysis and Design
User Interface Analysis and Design
Saqib Raza
 
Architecture Design
Architecture DesignArchitecture Design
Architecture Design
Saqib Raza
 
REQUIREMENT ENGINEERING
REQUIREMENT ENGINEERINGREQUIREMENT ENGINEERING
REQUIREMENT ENGINEERING
Saqib Raza
 

More from Saqib Raza (20)

The Design and Analysis of Algorithms.pdf
The Design and Analysis of Algorithms.pdfThe Design and Analysis of Algorithms.pdf
The Design and Analysis of Algorithms.pdf
 
An Introduction to the Analysis of Algorithms (2nd_Edition_Robert_Sedgewick,_...
An Introduction to the Analysis of Algorithms (2nd_Edition_Robert_Sedgewick,_...An Introduction to the Analysis of Algorithms (2nd_Edition_Robert_Sedgewick,_...
An Introduction to the Analysis of Algorithms (2nd_Edition_Robert_Sedgewick,_...
 
Social Impacts of Artificial intelligence
Social Impacts of Artificial intelligenceSocial Impacts of Artificial intelligence
Social Impacts of Artificial intelligence
 
Professional Practice Course Outline
Professional Practice Course OutlineProfessional Practice Course Outline
Professional Practice Course Outline
 
Ethical hacking
Ethical hackingEthical hacking
Ethical hacking
 
12 security policies
12 security policies12 security policies
12 security policies
 
11 Computer Privacy
11 Computer Privacy11 Computer Privacy
11 Computer Privacy
 
Software Engineering Code Of Ethics And Professional Practice
Software Engineering Code Of Ethics And Professional PracticeSoftware Engineering Code Of Ethics And Professional Practice
Software Engineering Code Of Ethics And Professional Practice
 
7 Engineering Profession
7 Engineering Profession7 Engineering Profession
7 Engineering Profession
 
6 software contracts
6 software contracts6 software contracts
6 software contracts
 
Introduction to Intellectual Property
Introduction to Intellectual PropertyIntroduction to Intellectual Property
Introduction to Intellectual Property
 
Itroduction to Business Ethics
Itroduction to Business EthicsItroduction to Business Ethics
Itroduction to Business Ethics
 
Types of Ethics
Types of EthicsTypes of Ethics
Types of Ethics
 
Introduction to ethics
Introduction to ethicsIntroduction to ethics
Introduction to ethics
 
Project Management Concepts
Project Management ConceptsProject Management Concepts
Project Management Concepts
 
Software Re-Engineering
Software Re-EngineeringSoftware Re-Engineering
Software Re-Engineering
 
Software Quality Assurance
Software Quality AssuranceSoftware Quality Assurance
Software Quality Assurance
 
User Interface Analysis and Design
User Interface Analysis and DesignUser Interface Analysis and Design
User Interface Analysis and Design
 
Architecture Design
Architecture DesignArchitecture Design
Architecture Design
 
REQUIREMENT ENGINEERING
REQUIREMENT ENGINEERINGREQUIREMENT ENGINEERING
REQUIREMENT ENGINEERING
 

Recently uploaded

The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
NgcHiNguyn25
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 

Recently uploaded (20)

The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 

Data_Mining: Practical Machine Learning Tools and Techniques 2ndEd.pdf

  • 1.
  • 2. Data Mining Practical Machine Learning Tools and Techniques
  • 3. The Morgan Kaufmann Series in Data Management Systems Series Editor: Jim Gray, Microsoft Research Data Mining: Practical Machine Learning Tools and Techniques, Second Edition Ian H. Witten and Eibe Frank Fuzzy Modeling and Genetic Algorithms for Data Mining and Exploration Earl Cox Data Modeling Essentials, Third Edition Graeme C. Simsion and Graham C. Witt Location-Based Services Jochen Schiller and Agnès Voisard Database Modeling with Microsoft® Visio for Enterprise Architects Terry Halpin, Ken Evans, Patrick Hallock, and Bill Maclean Designing Data-Intensive Web Applications Stefano Ceri, Piero Fraternali, Aldo Bongio, Marco Brambilla, Sara Comai, and Maristella Matera Mining the Web: Discovering Knowledge from Hypertext Data Soumen Chakrabarti Advanced SQL: 1999—Understanding Object-Relational and Other Advanced Features Jim Melton Database Tuning: Principles, Experiments, and Troubleshooting Techniques Dennis Shasha and Philippe Bonnet SQL: 1999—Understanding Relational Language Components Jim Melton and Alan R. Simon Information Visualization in Data Mining and Knowledge Discovery Edited by Usama Fayyad, Georges G. Grinstein, and Andreas Wierse Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery Gerhard Weikum and Gottfried Vossen Spatial Databases: With Application to GIS Philippe Rigaux, Michel Scholl, and Agnès Voisard Information Modeling and Relational Databases: From Conceptual Analysis to Logical Design Terry Halpin Component Database Systems Edited by Klaus R. Dittrich and Andreas Geppert Managing Reference Data in Enterprise Databases: Binding Corporate Data to the Wider World Malcolm Chisholm Data Mining: Concepts and Techniques Jiawei Han and Micheline Kamber Understanding SQL and Java Together: A Guide to SQLJ, JDBC, and Related Technologies Jim Melton and Andrew Eisenberg Database: Principles, Programming, and Performance, Second Edition Patrick O’Neil and Elizabeth O’Neil The Object Data Standard: ODMG 3.0 Edited by R. G. G. Cattell, Douglas K. Barry, Mark Berler, Jeff Eastman, David Jordan, Craig Russell, Olaf Schadow, Torsten Stanienda, and Fernando Velez Data on the Web: From Relations to Semistructured Data and XML Serge Abiteboul, Peter Buneman, and Dan Suciu Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations Ian H. Witten and Eibe Frank Joe Celko’s SQL for Smarties: Advanced SQL Programming, Second Edition Joe Celko Joe Celko’s Data and Databases: Concepts in Practice Joe Celko Developing Time-Oriented Database Applications in SQL Richard T. Snodgrass Web Farming for the Data Warehouse Richard D. Hackathorn Database Modeling & Design, Third Edition Toby J. Teorey Management of Heterogeneous and Autonomous Database Systems Edited by Ahmed Elmagarmid, Marek Rusinkiewicz, and Amit Sheth Object-Relational DBMSs: Tracking the Next Great Wave, Second Edition Michael Stonebraker and Paul Brown, with Dorothy Moore A Complete Guide to DB2 Universal Database Don Chamberlin Universal Database Management: A Guide to Object/Relational Technology Cynthia Maro Saracco Readings in Database Systems, Third Edition Edited by Michael Stonebraker and Joseph M. Hellerstein Understanding SQL’s Stored Procedures: A Complete Guide to SQL/PSM Jim Melton Principles of Multimedia Database Systems V. S. Subrahmanian Principles of Database Query Processing for Advanced Applications Clement T. Yu and Weiyi Meng Advanced Database Systems Carlo Zaniolo, Stefano Ceri, Christos Faloutsos, Richard T. Snodgrass, V. S. Subrahmanian, and Roberto Zicari Principles of Transaction Processing for the Systems Professional Philip A. Bernstein and Eric Newcomer Using the New DB2: IBM’s Object-Relational Database System Don Chamberlin Distributed Algorithms Nancy A. Lynch Active Database Systems: Triggers and Rules For Advanced Database Processing Edited by Jennifer Widom and Stefano Ceri Migrating Legacy Systems: Gateways, Interfaces & the Incremental Approach Michael L. Brodie and Michael Stonebraker Atomic Transactions Nancy Lynch, Michael Merritt, William Weihl, and Alan Fekete Query Processing For Advanced Database Systems Edited by Johann Christoph Freytag, David Maier, and Gottfried Vossen Transaction Processing: Concepts and Techniques Jim Gray and Andreas Reuter Building an Object-Oriented Database System: The Story of O2 Edited by François Bancilhon, Claude Delobel, and Paris Kanellakis Database Transaction Models For Advanced Applications Edited by Ahmed K. Elmagarmid A Guide to Developing Client/Server SQL Applications Setrag Khoshafian, Arvola Chan, Anna Wong, and Harry K. T. Wong The Benchmark Handbook For Database and Transaction Processing Systems, Second Edition Edited by Jim Gray Camelot and Avalon: A Distributed Transaction Facility Edited by Jeffrey L. Eppinger, Lily B. Mummert, and Alfred Z. Spector Readings in Object-Oriented Database Systems Edited by Stanley B. Zdonik and David Maier
  • 4. Data Mining Practical Machine Learning Tools and Techniques, Second Edition Ian H. Witten Department of Computer Science University of Waikato Eibe Frank Department of Computer Science University of Waikato AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO MORGAN KAUFMANN PUBLISHERS IS AN IMPRINT OF ELSEVIER
  • 5. Publisher: Diane Cerra Publishing Services Manager: Simon Crump Project Manager: Brandy Lilly Editorial Assistant: Asma Stephan Cover Design: Yvo Riezebos Design Cover Image: Getty Images Composition: SNP Best-set Typesetter Ltd., Hong Kong Technical Illustration: Dartmouth Publishing, Inc. Copyeditor: Graphic World Inc. Proofreader: Graphic World Inc. Indexer: Graphic World Inc. Interior printer: The Maple-Vail Book Manufacturing Group Cover printer: Phoenix Color Corp Morgan Kaufmann Publishers is an imprint of Elsevier. 500 Sansome Street, Suite 400, San Francisco, CA 94111 This book is printed on acid-free paper. © 2005 by Elsevier Inc. All rights reserved. Designations used by companies to distinguish their products are often claimed as trademarks or registered trademarks. In all instances in which Morgan Kaufmann Publishers is aware of a claim, the product names appear in initial capital or all capital letters. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopying, scanning, or otherwise— without prior written permission of the publisher. Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail: permissions@elsevier.com.uk. You may also complete your request on-line via the Elsevier homepage (http://elsevier.com) by selecting “Customer Support” and then “Obtaining Permissions.” Library of Congress Cataloging-in-Publication Data Witten, I. H. (Ian H.) Data mining : practical machine learning tools and techniques / Ian H. Witten, Eibe Frank. – 2nd ed. p. cm. – (Morgan Kaufmann series in data management systems) Includes bibliographical references and index. ISBN: 0-12-088407-0 1. Data mining. I. Frank, Eibe. II. Title. III. Series. QA76.9.D343W58 2005 006.3–dc22 2005043385 For information on all Morgan Kaufmann publications, visit our Web site at www.mkp.com or www.books.elsevier.com Printed in the United States of America 05 06 07 08 09 5 4 3 2 1 Working together to grow libraries in developing countries www.elsevier.com | www.bookaid.org | www.sabre.org
  • 6. Foreword Jim Gray, Series Editor Microsoft Research Technology now allows us to capture and store vast quantities of data. Finding patterns, trends, and anomalies in these datasets, and summarizing them with simple quantitative models, is one of the grand challenges of the infor- mation age—turning data into information and turning information into knowledge. There has been stunning progress in data mining and machine learning. The synthesis of statistics, machine learning, information theory, and computing has created a solid science, with a firm mathematical base, and with very powerful tools. Witten and Frank present much of this progress in this book and in the companion implementation of the key algorithms. As such, this is a milestone in the synthesis of data mining, data analysis, information theory, and machine learning. If you have not been following this field for the last decade, this is a great way to catch up on this exciting progress. If you have, then Witten and Frank’s presentation and the companion open-source workbench, called Weka, will be a useful addition to your toolkit. They present the basic theory of automatically extracting models from data, and then validating those models. The book does an excellent job of explaining the various models (decision trees, association rules, linear models, clustering, Bayes nets, neural nets) and how to apply them in practice. With this basis, they then walk through the steps and pitfalls of various approaches. They describe how to safely scrub datasets, how to build models, and how to evaluate a model’s predictive quality. Most of the book is tutorial, but Part II broadly describes how commercial systems work and gives a tour of the publicly available data mining workbench that the authors provide through a website. This Weka workbench has a graphical user interface that leads you through data mining tasks and has excellent data visualization tools that help understand the models. It is a great companion to the text and a useful and popular tool in its own right. v
  • 7. This book presents this new discipline in a very accessible form: as a text both to train the next generation of practitioners and researchers and to inform lifelong learners like myself. Witten and Frank have a passion for simple and elegant solutions. They approach each topic with this mindset, grounding all concepts in concrete examples, and urging the reader to consider the simple techniques first, and then progress to the more sophisticated ones if the simple ones prove inadequate. If you are interested in databases, and have not been following the machine learning field, this book is a great way to catch up on this exciting progress. If you have data that you want to analyze and understand, this book and the asso- ciated Weka toolkit are an excellent way to start. vi FOREWORD
  • 8. Contents Foreword v Preface xxiii Updated and revised content xxvii Acknowledgments xxix Part I Machine learning tools and techniques 1 1 What’s it all about? 3 1.1 Data mining and machine learning 4 Describing structural patterns 6 Machine learning 7 Data mining 9 1.2 Simple examples: The weather problem and others 9 The weather problem 10 Contact lenses: An idealized problem 13 Irises: A classic numeric dataset 15 CPU performance: Introducing numeric prediction 16 Labor negotiations: A more realistic example 17 Soybean classification: A classic machine learning success 18 1.3 Fielded applications 22 Decisions involving judgment 22 Screening images 23 Load forecasting 24 Diagnosis 25 Marketing and sales 26 Other applications 28 vii
  • 9. 1.4 Machine learning and statistics 29 1.5 Generalization as search 30 Enumerating the concept space 31 Bias 32 1.6 Data mining and ethics 35 1.7 Further reading 37 2 Input: Concepts, instances, and attributes 41 2.1 What’s a concept? 42 2.2 What’s in an example? 45 2.3 What’s in an attribute? 49 2.4 Preparing the input 52 Gathering the data together 52 ARFF format 53 Sparse data 55 Attribute types 56 Missing values 58 Inaccurate values 59 Getting to know your data 60 2.5 Further reading 60 3 Output: Knowledge representation 61 3.1 Decision tables 62 3.2 Decision trees 62 3.3 Classification rules 65 3.4 Association rules 69 3.5 Rules with exceptions 70 3.6 Rules involving relations 73 3.7 Trees for numeric prediction 76 3.8 Instance-based representation 76 3.9 Clusters 81 3.10 Further reading 82 viii CONTENTS
  • 10. 4 Algorithms: The basic methods 83 4.1 Inferring rudimentary rules 84 Missing values and numeric attributes 86 Discussion 88 4.2 Statistical modeling 88 Missing values and numeric attributes 92 Bayesian models for document classification 94 Discussion 96 4.3 Divide-and-conquer: Constructing decision trees 97 Calculating information 100 Highly branching attributes 102 Discussion 105 4.4 Covering algorithms: Constructing rules 105 Rules versus trees 107 A simple covering algorithm 107 Rules versus decision lists 111 4.5 Mining association rules 112 Item sets 113 Association rules 113 Generating rules efficiently 117 Discussion 118 4.6 Linear models 119 Numeric prediction: Linear regression 119 Linear classification: Logistic regression 121 Linear classification using the perceptron 124 Linear classification using Winnow 126 4.7 Instance-based learning 128 The distance function 128 Finding nearest neighbors efficiently 129 Discussion 135 4.8 Clustering 136 Iterative distance-based clustering 137 Faster distance calculations 138 Discussion 139 4.9 Further reading 139 CONTENTS ix
  • 11. 5 Credibility: Evaluating what’s been learned 143 5.1 Training and testing 144 5.2 Predicting performance 146 5.3 Cross-validation 149 5.4 Other estimates 151 Leave-one-out 151 The bootstrap 152 5.5 Comparing data mining methods 153 5.6 Predicting probabilities 157 Quadratic loss function 158 Informational loss function 159 Discussion 160 5.7 Counting the cost 161 Cost-sensitive classification 164 Cost-sensitive learning 165 Lift charts 166 ROC curves 168 Recall–precision curves 171 Discussion 172 Cost curves 173 5.8 Evaluating numeric prediction 176 5.9 The minimum description length principle 179 5.10 Applying the MDL principle to clustering 183 5.11 Further reading 184 6 Implementations: Real machine learning schemes 187 6.1 Decision trees 189 Numeric attributes 189 Missing values 191 Pruning 192 Estimating error rates 193 Complexity of decision tree induction 196 From trees to rules 198 C4.5: Choices and options 198 Discussion 199 6.2 Classification rules 200 Criteria for choosing tests 200 Missing values, numeric attributes 201 x CONTENTS
  • 12. Generating good rules 202 Using global optimization 205 Obtaining rules from partial decision trees 207 Rules with exceptions 210 Discussion 213 6.3 Extending linear models 214 The maximum margin hyperplane 215 Nonlinear class boundaries 217 Support vector regression 219 The kernel perceptron 222 Multilayer perceptrons 223 Discussion 235 6.4 Instance-based learning 235 Reducing the number of exemplars 236 Pruning noisy exemplars 236 Weighting attributes 237 Generalizing exemplars 238 Distance functions for generalized exemplars 239 Generalized distance functions 241 Discussion 242 6.5 Numeric prediction 243 Model trees 244 Building the tree 245 Pruning the tree 245 Nominal attributes 246 Missing values 246 Pseudocode for model tree induction 247 Rules from model trees 250 Locally weighted linear regression 251 Discussion 253 6.6 Clustering 254 Choosing the number of clusters 254 Incremental clustering 255 Category utility 260 Probability-based clustering 262 The EM algorithm 265 Extending the mixture model 266 Bayesian clustering 268 Discussion 270 6.7 Bayesian networks 271 Making predictions 272 Learning Bayesian networks 276 CONTENTS xi
  • 13. Specific algorithms 278 Data structures for fast learning 280 Discussion 283 7 Transformations: Engineering the input and output 285 7.1 Attribute selection 288 Scheme-independent selection 290 Searching the attribute space 292 Scheme-specific selection 294 7.2 Discretizing numeric attributes 296 Unsupervised discretization 297 Entropy-based discretization 298 Other discretization methods 302 Entropy-based versus error-based discretization 302 Converting discrete to numeric attributes 304 7.3 Some useful transformations 305 Principal components analysis 306 Random projections 309 Text to attribute vectors 309 Time series 311 7.4 Automatic data cleansing 312 Improving decision trees 312 Robust regression 313 Detecting anomalies 314 7.5 Combining multiple models 315 Bagging 316 Bagging with costs 319 Randomization 320 Boosting 321 Additive regression 325 Additive logistic regression 327 Option trees 328 Logistic model trees 331 Stacking 332 Error-correcting output codes 334 7.6 Using unlabeled data 337 Clustering for classification 337 Co-training 339 EM and co-training 340 7.7 Further reading 341 xii CONTENTS
  • 14. 8 Moving on: Extensions and applications 345 8.1 Learning from massive datasets 346 8.2 Incorporating domain knowledge 349 8.3 Text and Web mining 351 8.4 Adversarial situations 356 8.5 Ubiquitous data mining 358 8.6 Further reading 361 Part II The Weka machine learning workbench 363 9 Introduction to Weka 365 9.1 What’s in Weka? 366 9.2 How do you use it? 367 9.3 What else can you do? 368 9.4 How do you get it? 368 10 The Explorer 369 10.1 Getting started 369 Preparing the data 370 Loading the data into the Explorer 370 Building a decision tree 373 Examining the output 373 Doing it again 377 Working with models 377 When things go wrong 378 10.2 Exploring the Explorer 380 Loading and filtering files 380 Training and testing learning schemes 384 Do it yourself: The User Classifier 388 Using a metalearner 389 Clustering and association rules 391 Attribute selection 392 Visualization 393 10.3 Filtering algorithms 393 Unsupervised attribute filters 395 Unsupervised instance filters 400 Supervised filters 401 CONTENTS xiii
  • 15. 10.4 Learning algorithms 403 Bayesian classifiers 403 Trees 406 Rules 408 Functions 409 Lazy classifiers 413 Miscellaneous classifiers 414 10.5 Metalearning algorithms 414 Bagging and randomization 414 Boosting 416 Combining classifiers 417 Cost-sensitive learning 417 Optimizing performance 417 Retargeting classifiers for different tasks 418 10.6 Clustering algorithms 418 10.7 Association-rule learners 419 10.8 Attribute selection 420 Attribute subset evaluators 422 Single-attribute evaluators 422 Search methods 423 11 The Knowledge Flow interface 427 11.1 Getting started 427 11.2 The Knowledge Flow components 430 11.3 Configuring and connecting the components 431 11.4 Incremental learning 433 12 The Experimenter 437 12.1 Getting started 438 Running an experiment 439 Analyzing the results 440 12.2 Simple setup 441 12.3 Advanced setup 442 12.4 The Analyze panel 443 12.5 Distributing processing over several machines 445 xiv CONTENTS
  • 16. 13 The command-line interface 449 13.1 Getting started 449 13.2 The structure of Weka 450 Classes, instances, and packages 450 The weka.core package 451 The weka.classifiers package 453 Other packages 455 Javadoc indices 456 13.3 Command-line options 456 Generic options 456 Scheme-specific options 458 14 Embedded machine learning 461 14.1 A simple data mining application 461 14.2 Going through the code 462 main() 462 MessageClassifier() 462 updateData() 468 classifyMessage() 468 15 Writing new learning schemes 471 15.1 An example classifier 471 buildClassifier() 472 makeTree() 472 computeInfoGain() 480 classifyInstance() 480 main() 481 15.2 Conventions for implementing classifiers 483 References 485 Index 505 About the authors 525 CONTENTS xv
  • 17.
  • 18. List of Figures Figure 1.1 Rules for the contact lens data. 13 Figure 1.2 Decision tree for the contact lens data. 14 Figure 1.3 Decision trees for the labor negotiations data. 19 Figure 2.1 A family tree and two ways of expressing the sister-of relation. 46 Figure 2.2 ARFF file for the weather data. 54 Figure 3.1 Constructing a decision tree interactively: (a) creating a rectangular test involving petallength and petalwidth and (b) the resulting (unfinished) decision tree. 64 Figure 3.2 Decision tree for a simple disjunction. 66 Figure 3.3 The exclusive-or problem. 67 Figure 3.4 Decision tree with a replicated subtree. 68 Figure 3.5 Rules for the Iris data. 72 Figure 3.6 The shapes problem. 73 Figure 3.7 Models for the CPU performance data: (a) linear regression, (b) regression tree, and (c) model tree. 77 Figure 3.8 Different ways of partitioning the instance space. 79 Figure 3.9 Different ways of representing clusters. 81 Figure 4.1 Pseudocode for 1R. 85 Figure 4.2 Tree stumps for the weather data. 98 Figure 4.3 Expanded tree stumps for the weather data. 100 Figure 4.4 Decision tree for the weather data. 101 Figure 4.5 Tree stump for the ID code attribute. 103 Figure 4.6 Covering algorithm: (a) covering the instances and (b) the decision tree for the same problem. 106 Figure 4.7 The instance space during operation of a covering algorithm. 108 Figure 4.8 Pseudocode for a basic rule learner. 111 Figure 4.9 Logistic regression: (a) the logit transform and (b) an example logistic regression function. 122 xvii
  • 19. Figure 4.10 The perceptron: (a) learning rule and (b) representation as a neural network. 125 Figure 4.11 The Winnow algorithm: (a) the unbalanced version and (b) the balanced version. 127 Figure 4.12 A kD-tree for four training instances: (a) the tree and (b) instances and splits. 130 Figure 4.13 Using a kD-tree to find the nearest neighbor of the star. 131 Figure 4.14 Ball tree for 16 training instances: (a) instances and balls and (b) the tree. 134 Figure 4.15 Ruling out an entire ball (gray) based on a target point (star) and its current nearest neighbor. 135 Figure 4.16 A ball tree: (a) two cluster centers and their dividing line and (b) the corresponding tree. 140 Figure 5.1 A hypothetical lift chart. 168 Figure 5.2 A sample ROC curve. 169 Figure 5.3 ROC curves for two learning methods. 170 Figure 5.4 Effects of varying the probability threshold: (a) the error curve and (b) the cost curve. 174 Figure 6.1 Example of subtree raising, where node C is “raised” to subsume node B. 194 Figure 6.2 Pruning the labor negotiations decision tree. 196 Figure 6.3 Algorithm for forming rules by incremental reduced-error pruning. 205 Figure 6.4 RIPPER: (a) algorithm for rule learning and (b) meaning of symbols. 206 Figure 6.5 Algorithm for expanding examples into a partial tree. 208 Figure 6.6 Example of building a partial tree. 209 Figure 6.7 Rules with exceptions for the iris data. 211 Figure 6.8 A maximum margin hyperplane. 216 Figure 6.9 Support vector regression: (a) e = 1, (b) e = 2, and (c) e = 0.5. 221 Figure 6.10 Example datasets and corresponding perceptrons. 225 Figure 6.11 Step versus sigmoid: (a) step function and (b) sigmoid function. 228 Figure 6.12 Gradient descent using the error function x2 + 1. 229 Figure 6.13 Multilayer perceptron with a hidden layer. 231 Figure 6.14 A boundary between two rectangular classes. 240 Figure 6.15 Pseudocode for model tree induction. 248 Figure 6.16 Model tree for a dataset with nominal attributes. 250 Figure 6.17 Clustering the weather data. 256 xviii LIST OF FIGURES
  • 20. Figure 6.18 Hierarchical clusterings of the iris data. 259 Figure 6.19 A two-class mixture model. 264 Figure 6.20 A simple Bayesian network for the weather data. 273 Figure 6.21 Another Bayesian network for the weather data. 274 Figure 6.22 The weather data: (a) reduced version and (b) corresponding AD tree. 281 Figure 7.1 Attribute space for the weather dataset. 293 Figure 7.2 Discretizing the temperature attribute using the entropy method. 299 Figure 7.3 The result of discretizing the temperature attribute. 300 Figure 7.4 Class distribution for a two-class, two-attribute problem. 303 Figure 7.5 Principal components transform of a dataset: (a) variance of each component and (b) variance plot. 308 Figure 7.6 Number of international phone calls from Belgium, 1950–1973. 314 Figure 7.7 Algorithm for bagging. 319 Figure 7.8 Algorithm for boosting. 322 Figure 7.9 Algorithm for additive logistic regression. 327 Figure 7.10 Simple option tree for the weather data. 329 Figure 7.11 Alternating decision tree for the weather data. 330 Figure 10.1 The Explorer interface. 370 Figure 10.2 Weather data: (a) spreadsheet, (b) CSV format, and (c) ARFF. 371 Figure 10.3 The Weka Explorer: (a) choosing the Explorer interface and (b) reading in the weather data. 372 Figure 10.4 Using J4.8: (a) finding it in the classifiers list and (b) the Classify tab. 374 Figure 10.5 Output from the J4.8 decision tree learner. 375 Figure 10.6 Visualizing the result of J4.8 on the iris dataset: (a) the tree and (b) the classifier errors. 379 Figure 10.7 Generic object editor: (a) the editor, (b) more information (click More), and (c) choosing a converter (click Choose). 381 Figure 10.8 Choosing a filter: (a) the filters menu, (b) an object editor, and (c) more information (click More). 383 Figure 10.9 The weather data with two attributes removed. 384 Figure 10.10 Processing the CPU performance data with M5¢. 385 Figure 10.11 Output from the M5¢ program for numeric prediction. 386 Figure 10.12 Visualizing the errors: (a) from M5¢ and (b) from linear regression. 388 LIST OF FIGURES xix
  • 21. Figure 10.13 Working on the segmentation data with the User Classifier: (a) the data visualizer and (b) the tree visualizer. 390 Figure 10.14 Configuring a metalearner for boosting decision stumps. 391 Figure 10.15 Output from the Apriori program for association rules. 392 Figure 10.16 Visualizing the Iris dataset. 394 Figure 10.17 Using Weka’s metalearner for discretization: (a) configuring FilteredClassifier, and (b) the menu of filters. 402 Figure 10.18 Visualizing a Bayesian network for the weather data (nominal version): (a) default output, (b) a version with the maximum number of parents set to 3 in the search algorithm, and (c) probability distribution table for the windy node in (b). 406 Figure 10.19 Changing the parameters for J4.8. 407 Figure 10.20 Using Weka’s neural-network graphical user interface. 411 Figure 10.21 Attribute selection: specifying an evaluator and a search method. 420 Figure 11.1 The Knowledge Flow interface. 428 Figure 11.2 Configuring a data source: (a) the right-click menu and (b) the file browser obtained from the Configure menu item. 429 Figure 11.3 Operations on the Knowledge Flow components. 432 Figure 11.4 A Knowledge Flow that operates incrementally: (a) the configuration and (b) the strip chart output. 434 Figure 12.1 An experiment: (a) setting it up, (b) the results file, and (c) a spreadsheet with the results. 438 Figure 12.2 Statistical test results for the experiment in Figure 12.1. 440 Figure 12.3 Setting up an experiment in advanced mode. 442 Figure 12.4 Rows and columns of Figure 12.2: (a) row field, (b) column field, (c) result of swapping the row and column selections, and (d) substituting Run for Dataset as rows. 444 Figure 13.1 Using Javadoc: (a) the front page and (b) the weka.core package. 452 Figure 13.2 DecisionStump: A class of the weka.classifiers.trees package. 454 Figure 14.1 Source code for the message classifier. 463 Figure 15.1 Source code for the ID3 decision tree learner. 473 xx LIST OF FIGURES
  • 22. List ofTables Table 1.1 The contact lens data. 6 Table 1.2 The weather data. 11 Table 1.3 Weather data with some numeric attributes. 12 Table 1.4 The iris data. 15 Table 1.5 The CPU performance data. 16 Table 1.6 The labor negotiations data. 18 Table 1.7 The soybean data. 21 Table 2.1 Iris data as a clustering problem. 44 Table 2.2 Weather data with a numeric class. 44 Table 2.3 Family tree represented as a table. 47 Table 2.4 The sister-of relation represented in a table. 47 Table 2.5 Another relation represented as a table. 49 Table 3.1 A new iris flower. 70 Table 3.2 Training data for the shapes problem. 74 Table 4.1 Evaluating the attributes in the weather data. 85 Table 4.2 The weather data with counts and probabilities. 89 Table 4.3 A new day. 89 Table 4.4 The numeric weather data with summary statistics. 93 Table 4.5 Another new day. 94 Table 4.6 The weather data with identification codes. 103 Table 4.7 Gain ratio calculations for the tree stumps of Figure 4.2. 104 Table 4.8 Part of the contact lens data for which astigmatism = yes. 109 Table 4.9 Part of the contact lens data for which astigmatism = yes and tear production rate = normal. 110 Table 4.10 Item sets for the weather data with coverage 2 or greater. 114 Table 4.11 Association rules for the weather data. 116 Table 5.1 Confidence limits for the normal distribution. 148 xxi
  • 23. Table 5.2 Confidence limits for Student’s distribution with 9 degrees of freedom. 155 Table 5.3 Different outcomes of a two-class prediction. 162 Table 5.4 Different outcomes of a three-class prediction: (a) actual and (b) expected. 163 Table 5.5 Default cost matrixes: (a) a two-class case and (b) a three-class case. 164 Table 5.6 Data for a lift chart. 167 Table 5.7 Different measures used to evaluate the false positive versus the false negative tradeoff. 172 Table 5.8 Performance measures for numeric prediction. 178 Table 5.9 Performance measures for four numeric prediction models. 179 Table 6.1 Linear models in the model tree. 250 Table 7.1 Transforming a multiclass problem into a two-class one: (a) standard method and (b) error-correcting code. 335 Table 10.1 Unsupervised attribute filters. 396 Table 10.2 Unsupervised instance filters. 400 Table 10.3 Supervised attribute filters. 402 Table 10.4 Supervised instance filters. 402 Table 10.5 Classifier algorithms in Weka. 404 Table 10.6 Metalearning algorithms in Weka. 415 Table 10.7 Clustering algorithms. 419 Table 10.8 Association-rule learners. 419 Table 10.9 Attribute evaluation methods for attribute selection. 421 Table 10.10 Search methods for attribute selection. 421 Table 11.1 Visualization and evaluation components. 430 Table 13.1 Generic options for learning schemes in Weka. 457 Table 13.2 Scheme-specific options for the J4.8 decision tree learner. 458 Table 15.1 Simple learning schemes in Weka. 472 xxii LIST OF TABLES
  • 24. Preface The convergence of computing and communication has produced a society that feeds on information. Yet most of the information is in its raw form: data. If data is characterized as recorded facts, then information is the set of patterns, or expectations, that underlie the data. There is a huge amount of information locked up in databases—information that is potentially important but has not yet been discovered or articulated. Our mission is to bring it forth. Data mining is the extraction of implicit, previously unknown, and poten- tially useful information from data. The idea is to build computer programs that sift through databases automatically, seeking regularities or patterns. Strong pat- terns, if found, will likely generalize to make accurate predictions on future data. Of course, there will be problems. Many patterns will be banal and uninterest- ing. Others will be spurious, contingent on accidental coincidences in the par- ticular dataset used. In addition real data is imperfect: Some parts will be garbled, and some will be missing. Anything discovered will be inexact: There will be exceptions to every rule and cases not covered by any rule. Algorithms need to be robust enough to cope with imperfect data and to extract regulari- ties that are inexact but useful. Machine learning provides the technical basis of data mining. It is used to extract information from the raw data in databases—information that is expressed in a comprehensible form and can be used for a variety of purposes. The process is one of abstraction: taking the data, warts and all, and inferring whatever structure underlies it. This book is about the tools and techniques of machine learning used in practical data mining for finding, and describing, structural patterns in data. As with any burgeoning new technology that enjoys intense commercial attention, the use of data mining is surrounded by a great deal of hype in the technical—and sometimes the popular—press. Exaggerated reports appear of the secrets that can be uncovered by setting learning algorithms loose on oceans of data. But there is no magic in machine learning, no hidden power, no xxiii
  • 25. alchemy. Instead, there is an identifiable body of simple and practical techniques that can often extract useful information from raw data. This book describes these techniques and shows how they work. We interpret machine learning as the acquisition of structural descriptions from examples. The kind of descriptions found can be used for prediction, explanation, and understanding. Some data mining applications focus on pre- diction: forecasting what will happen in new situations from data that describe what happened in the past, often by guessing the classification of new examples. But we are equally—perhaps more—interested in applications in which the result of “learning” is an actual description of a structure that can be used to classify examples. This structural description supports explanation, under- standing, and prediction. In our experience, insights gained by the applications’ users are of most interest in the majority of practical data mining applications; indeed, this is one of machine learning’s major advantages over classical statis- tical modeling. The book explains a variety of machine learning methods. Some are peda- gogically motivated: simple schemes designed to explain clearly how the basic ideas work. Others are practical: real systems used in applications today. Many are contemporary and have been developed only in the last few years. A comprehensive software resource, written in the Java language, has been created to illustrate the ideas in the book. Called the Waikato Environment for Knowledge Analysis, or Weka1 for short, it is available as source code on the World Wide Web at http://www.cs.waikato.ac.nz/ml/weka. It is a full, industrial- strength implementation of essentially all the techniques covered in this book. It includes illustrative code and working implementations of machine learning methods. It offers clean, spare implementations of the simplest techniques, designed to aid understanding of the mechanisms involved. It also provides a workbench that includes full, working, state-of-the-art implementations of many popular learning schemes that can be used for practical data mining or for research. Finally, it contains a framework, in the form of a Java class library, that supports applications that use embedded machine learning and even the implementation of new learning schemes. The objective of this book is to introduce the tools and techniques for machine learning that are used in data mining. After reading it, you will under- stand what these techniques are and appreciate their strengths and applicabil- ity. If you wish to experiment with your own data, you will be able to do this easily with the Weka software. xxiv PREFACE 1 Found only on the islands of New Zealand, the weka (pronounced to rhyme with Mecca) is a flightless bird with an inquisitive nature.
  • 26. The book spans the gulf between the intensely practical approach taken by trade books that provide case studies on data mining and the more theoretical, principle-driven exposition found in current textbooks on machine learning. (A brief description of these books appears in the Further reading section at the end of Chapter 1.) This gulf is rather wide. To apply machine learning tech- niques productively, you need to understand something about how they work; this is not a technology that you can apply blindly and expect to get good results. Different problems yield to different techniques, but it is rarely obvious which techniques are suitable for a given situation: you need to know something about the range of possible solutions. We cover an extremely wide range of techniques. We can do this because, unlike many trade books, this volume does not promote any particular commercial software or approach. We include a large number of examples, but they use illustrative datasets that are small enough to allow you to follow what is going on. Real datasets are far too large to show this (and in any case are usually company confidential). Our datasets are chosen not to illustrate actual large-scale practical problems but to help you understand what the different techniques do, how they work, and what their range of application is. The book is aimed at the technically aware general reader interested in the principles and ideas underlying the current practice of data mining. It will also be of interest to information professionals who need to become acquainted with this new technology and to all those who wish to gain a detailed technical understanding of what machine learning involves. It is written for an eclectic audience of information systems practitioners, programmers, consultants, developers, information technology managers, specification writers, patent examiners, and curious laypeople—as well as students and professors—who need an easy-to-read book with lots of illustrations that describes what the major machine learning techniques are, what they do, how they are used, and how they work. It is practically oriented, with a strong “how to” flavor, and includes algorithms, code, and implementations. All those involved in practical data mining will benefit directly from the techniques described. The book is aimed at people who want to cut through to the reality that underlies the hype about machine learning and who seek a practical, nonacademic, unpretentious approach. We have avoided requiring any specific theoretical or mathematical knowledge except in some sections marked by a light gray bar in the margin. These contain optional material, often for the more technical or theoretically inclined reader, and may be skipped without loss of continuity. The book is organized in layers that make the ideas accessible to readers who are interested in grasping the basics and to those who would like more depth of treatment, along with full details on the techniques covered.We believe that con- sumers of machine learning need to have some idea of how the algorithms they use work. It is often observed that data models are only as good as the person PREFACE xxv
  • 27. who interprets them, and that person needs to know something about how the models are produced to appreciate the strengths, and limitations, of the tech- nology. However, it is not necessary for all data model users to have a deep understanding of the finer details of the algorithms. We address this situation by describing machine learning methods at succes- sive levels of detail. You will learn the basic ideas, the topmost level, by reading the first three chapters. Chapter 1 describes, through examples, what machine learning is and where it can be used; it also provides actual practical applica- tions. Chapters 2 and 3 cover the kinds of input and output—or knowledge representation—involved. Different kinds of output dictate different styles of algorithm, and at the next level Chapter 4 describes the basic methods of machine learning, simplified to make them easy to comprehend. Here the prin- ciples involved are conveyed in a variety of algorithms without getting into intricate details or tricky implementation issues. To make progress in the appli- cation of machine learning techniques to particular data mining problems, it is essential to be able to measure how well you are doing. Chapter 5, which can be read out of sequence, equips you to evaluate the results obtained from machine learning, addressing the sometimes complex issues involved in performance evaluation. At the lowest and most detailed level, Chapter 6 exposes in naked detail the nitty-gritty issues of implementing a spectrum of machine learning algorithms, including the complexities necessary for them to work well in practice.Although many readers may want to ignore this detailed information, it is at this level that the full, working, tested implementations of machine learning schemes in Weka are written. Chapter 7 describes practical topics involved with engineering the input to machine learning—for example, selecting and discretizing attributes— and covers several more advanced techniques for refining and combining the output from different learning techniques. The final chapter of Part I looks to the future. The book describes most methods used in practical machine learning. However, it does not cover reinforcement learning, because it is rarely applied in practical data mining; genetic algorithm approaches, because these are just an optimization technique; or relational learning and inductive logic program- ming, because they are rarely used in mainstream data mining applications. The data mining system that illustrates the ideas in the book is described in Part II to clearly separate conceptual material from the practical aspects of how to use it. You can skip to Part II directly from Chapter 4 if you are in a hurry to analyze your data and don’t want to be bothered with the technical details. Java has been chosen for the implementations of machine learning tech- niques that accompany this book because, as an object-oriented programming language, it allows a uniform interface to learning schemes and methods for pre- and postprocessing. We have chosen Java instead of C++, Smalltalk, or other xxvi PREFACE
  • 28. object-oriented languages because programs written in Java can be run on almost any computer without having to be recompiled, having to undergo com- plicated installation procedures, or—worst of all—having to change the code. A Java program is compiled into byte-code that can be executed on any com- puter equipped with an appropriate interpreter. This interpreter is called the Java virtual machine. Java virtual machines—and, for that matter, Java compil- ers—are freely available for all important platforms. Like all widely used programming languages, Java has received its share of criticism. Although this is not the place to elaborate on such issues, in several cases the critics are clearly right. However, of all currently available program- ming languages that are widely supported, standardized, and extensively docu- mented, Java seems to be the best choice for the purpose of this book. Its main disadvantage is speed of execution—or lack of it. Executing a Java program is several times slower than running a corresponding program written in C lan- guage because the virtual machine has to translate the byte-code into machine code before it can be executed. In our experience the difference is a factor of three to five if the virtual machine uses a just-in-time compiler. Instead of trans- lating each byte-code individually, a just-in-time compiler translates whole chunks of byte-code into machine code, thereby achieving significant speedup. However, if this is still to slow for your application, there are compilers that translate Java programs directly into machine code, bypassing the byte-code step. This code cannot be executed on other platforms, thereby sacrificing one of Java’s most important advantages. Updated and revised content We finished writing the first edition of this book in 1999 and now, in April 2005, are just polishing this second edition. The areas of data mining and machine learning have matured in the intervening years. Although the core of material in this edition remains the same, we have made the most of our opportunity to update it to reflect the changes that have taken place over 5 years. There have been errors to fix, errors that we had accumulated in our publicly available errata file. Surprisingly few were found, and we hope there are even fewer in this second edition. (The errata for the second edition may be found through the book’s home page at http://www.cs.waikato.ac.nz/ml/weka/book.html.) We have thoroughly edited the material and brought it up to date, and we practically doubled the number of references. The most enjoyable part has been adding new material. Here are the highlights. Bowing to popular demand, we have added comprehensive information on neural networks: the perceptron and closely related Winnow algorithm in Section 4.6 and the multilayer perceptron and backpropagation algorithm PREFACE xxvii
  • 29. in Section 6.3. We have included more recent material on implementing nonlinear decision boundaries using both the kernel perceptron and radial basis function networks. There is a new section on Bayesian networks, again in response to readers’ requests, with a description of how to learn classifiers based on these networks and how to implement them efficiently using all-dimensions trees. The Weka machine learning workbench that accompanies the book, a widely used and popular feature of the first edition, has acquired a radical new look in the form of an interactive interface—or rather, three separate interactive inter- faces—that make it far easier to use. The primary one is the Explorer, which gives access to all of Weka’s facilities using menu selection and form filling. The others are the Knowledge Flow interface, which allows you to design configu- rations for streamed data processing, and the Experimenter, with which you set up automated experiments that run selected machine learning algorithms with different parameter settings on a corpus of datasets, collect performance statis- tics, and perform significance tests on the results. These interfaces lower the bar for becoming a practicing data miner, and we include a full description of how to use them. However, the book continues to stand alone, independent of Weka, and to underline this we have moved all material on the workbench into a sep- arate Part II at the end of the book. In addition to becoming far easier to use, Weka has grown over the last 5 years and matured enormously in its data mining capabilities. It now includes an unparalleled range of machine learning algorithms and related techniques. The growth has been partly stimulated by recent developments in the field and partly led by Weka users and driven by demand. This puts us in a position in which we know a great deal about what actual users of data mining want, and we have capitalized on this experience when deciding what to include in this new edition. The earlier chapters, containing more general and foundational material, have suffered relatively little change. We have added more examples of fielded applications to Chapter 1, a new subsection on sparse data and a little on string attributes and date attributes to Chapter 2, and a description of interactive deci- sion tree construction, a useful and revealing technique to help you grapple with your data using manually built decision trees, to Chapter 3. In addition to introducing linear decision boundaries for classification, the infrastructure for neural networks, Chapter 4 includes new material on multi- nomial Bayes models for document classification and on logistic regression. The last 5 years have seen great interest in data mining for text, and this is reflected in our introduction to string attributes in Chapter 2, multinomial Bayes for doc- ument classification in Chapter 4, and text transformations in Chapter 7. Chapter 4 includes a great deal of new material on efficient data structures for searching the instance space: kD-trees and the recently invented ball trees. These xxviii PREFACE
  • 30. are used to find nearest neighbors efficiently and to accelerate distance-based clustering. Chapter 5 describes the principles of statistical evaluation of machine learn- ing, which have not changed. The main addition, apart from a note on the Kappa statistic for measuring the success of a predictor, is a more detailed treatment of cost-sensitive learning. We describe how to use a classifier, built without taking costs into consideration, to make predictions that are sensitive to cost; alternatively, we explain how to take costs into account during the training process to build a cost-sensitive model. We also cover the popular new tech- nique of cost curves. There are several additions to Chapter 6, apart from the previously men- tioned material on neural networks and Bayesian network classifiers. More details—gory details—are given of the heuristics used in the successful RIPPER rule learner. We describe how to use model trees to generate rules for numeric prediction. We show how to apply locally weighted regression to classification problems. Finally, we describe the X-means clustering algorithm, which is a big improvement on traditional k-means. Chapter 7 on engineering the input and output has changed most, because this is where recent developments in practical machine learning have been con- centrated. We describe new attribute selection schemes such as race search and the use of support vector machines and new methods for combining models such as additive regression, additive logistic regression, logistic model trees, and option trees. We give a full account of LogitBoost (which was mentioned in the first edition but not described). There is a new section on useful transforma- tions, including principal components analysis and transformations for text mining and time series. We also cover recent developments in using unlabeled data to improve classification, including the co-training and co-EM methods. The final chapter of Part I on new directions and different perspectives has been reworked to keep up with the times and now includes contemporary chal- lenges such as adversarial learning and ubiquitous data mining. Acknowledgments Writing the acknowledgments is always the nicest part! A lot of people have helped us, and we relish this opportunity to thank them. This book has arisen out of the machine learning research project in the Computer Science Depart- ment at the University of Waikato, New Zealand. We have received generous encouragement and assistance from the academic staff members on that project: John Cleary, Sally Jo Cunningham, Matt Humphrey, Lyn Hunt, Bob McQueen, Lloyd Smith, and Tony Smith. Special thanks go to Mark Hall, Bernhard Pfahringer, and above all Geoff Holmes, the project leader and source of inspi- PREFACE xxix
  • 31. ration. All who have worked on the machine learning project here have con- tributed to our thinking: we would particularly like to mention Steve Garner, Stuart Inglis, and Craig Nevill-Manning for helping us to get the project off the ground in the beginning when success was less certain and things were more difficult. The Weka system that illustrates the ideas in this book forms a crucial com- ponent of it. It was conceived by the authors and designed and implemented by Eibe Frank, along with Len Trigg and Mark Hall. Many people in the machine learning laboratory at Waikato made significant contributions. Since the first edition of the book the Weka team has expanded considerably: so many people have contributed that it is impossible to acknowledge everyone properly. We are grateful to Remco Bouckaert for his implementation of Bayesian networks, Dale Fletcher for many database-related aspects, Ashraf Kibriya and Richard Kirkby for contributions far too numerous to list, Niels Landwehr for logistic model trees, Abdelaziz Mahoui for the implementation of K*, Stefan Mutter for asso- ciation rule mining, Gabi Schmidberger and Malcolm Ware for numerous mis- cellaneous contributions, Tony Voyle for least-median-of-squares regression, Yong Wang for Pace regression and the implementation of M5¢, and Xin Xu for JRip, logistic regression, and many other contributions. Our sincere thanks go to all these people for their dedicated work and to the many contributors to Weka from outside our group at Waikato. Tucked away as we are in a remote (but very pretty) corner of the Southern Hemisphere, we greatly appreciate the visitors to our department who play a crucial role in acting as sounding boards and helping us to develop our thinking. We would like to mention in particular Rob Holte, Carl Gutwin, and Russell Beale, each of whom visited us for several months; David Aha, who although he only came for a few days did so at an early and fragile stage of the project and performed a great service by his enthusiasm and encouragement; and Kai Ming Ting, who worked with us for 2 years on many of the topics described in Chapter 7 and helped to bring us into the mainstream of machine learning. Students at Waikato have played a significant role in the development of the project. Jamie Littin worked on ripple-down rules and relational learning. Brent Martin explored instance-based learning and nested instance-based representa- tions. Murray Fife slaved over relational learning, and Nadeeka Madapathage investigated the use of functional languages for expressing machine learning algorithms. Other graduate students have influenced us in numerous ways, par- ticularly Gordon Paynter, YingYing Wen, and Zane Bray, who have worked with us on text mining. Colleagues Steve Jones and Malika Mahoui have also made far-reaching contributions to these and other machine learning projects. More recently we have learned much from our many visiting students from Freiburg, including Peter Reutemann and Nils Weidmann. xxx PREFACE
  • 32. Ian Witten would like to acknowledge the formative role of his former stu- dents at Calgary, particularly Brent Krawchuk, Dave Maulsby, Thong Phan, and Tanja Mitrovic, all of whom helped him develop his early ideas in machine learning, as did faculty members Bruce MacDonald, Brian Gaines, and David Hill at Calgary and John Andreae at the University of Canterbury. Eibe Frank is indebted to his former supervisor at the University of Karlsruhe, Klaus-Peter Huber (now with SAS Institute), who infected him with the fascination of machines that learn. On his travels Eibe has benefited from interactions with Peter Turney, Joel Martin, and Berry de Bruijn in Canada and with Luc de Raedt, Christoph Helma, Kristian Kersting, Stefan Kramer, Ulrich Rückert, and Ashwin Srinivasan in Germany. Diane Cerra and Asma Stephan of Morgan Kaufmann have worked hard to shape this book, and Lisa Royse, our production editor, has made the process go smoothly. Bronwyn Webster has provided excellent support at the Waikato end. We gratefully acknowledge the unsung efforts of the anonymous reviewers, one of whom in particular made a great number of pertinent and constructive comments that helped us to improve this book significantly. In addition, we would like to thank the librarians of the Repository of Machine Learning Data- bases at the University of California, Irvine, whose carefully collected datasets have been invaluable in our research. Our research has been funded by the New Zealand Foundation for Research, Science and Technology and the Royal Society of New Zealand Marsden Fund. The Department of Computer Science at the University of Waikato has gener- ously supported us in all sorts of ways, and we owe a particular debt of gratitude to Mark Apperley for his enlightened leadership and warm encour- agement. Part of the first edition was written while both authors were visiting the University of Calgary, Canada, and the support of the Computer Science department there is gratefully acknowledged—as well as the positive and helpful attitude of the long-suffering students in the machine learning course on whom we experimented. In producing the second edition Ian was generously supported by Canada’s Informatics Circle of Research Excellence and by the University of Lethbridge in southern Alberta, which gave him what all authors yearn for—a quiet space in pleasant and convivial surroundings in which to work. Last, and most of all, we are grateful to our families and partners. Pam, Anna, and Nikki were all too well aware of the implications of having an author in the house (“not again!”) but let Ian go ahead and write the book anyway. Julie was always supportive, even when Eibe had to burn the midnight oil in the machine learning lab, and Immo and Ollig provided exciting diversions. Between us we hail from Canada, England, Germany, Ireland, and Samoa: New Zealand has brought us together and provided an ideal, even idyllic, place to do this work. PREFACE xxxi
  • 33.
  • 34. p a r t I Machine LearningTools andTechniques
  • 35.
  • 36. Human in vitro fertilization involves collecting several eggs from a woman’s ovaries, which, after fertilization with partner or donor sperm, produce several embryos. Some of these are selected and transferred to the woman’s uterus. The problem is to select the “best” embryos to use—the ones that are most likely to survive. Selection is based on around 60 recorded features of the embryos— characterizing their morphology, oocyte, follicle, and the sperm sample. The number of features is sufficiently large that it is difficult for an embryologist to assess them all simultaneously and correlate historical data with the crucial outcome of whether that embryo did or did not result in a live child. In a research project in England, machine learning is being investigated as a tech- nique for making the selection, using as training data historical records of embryos and their outcome. Every year, dairy farmers in New Zealand have to make a tough business deci- sion: which cows to retain in their herd and which to sell off to an abattoir. Typi- cally, one-fifth of the cows in a dairy herd are culled each year near the end of the milking season as feed reserves dwindle. Each cow’s breeding and milk pro- c h a p t e r 1 What’s ItAllAbout? 3
  • 37. duction history influences this decision. Other factors include age (a cow is nearing the end of its productive life at 8 years), health problems, history of dif- ficult calving, undesirable temperament traits (kicking or jumping fences), and not being in calf for the following season. About 700 attributes for each of several million cows have been recorded over the years. Machine learning is being investigated as a way of ascertaining what factors are taken into account by successful farmers—not to automate the decision but to propagate their skills and experience to others. Life and death. From Europe to the antipodes. Family and business. Machine learning is a burgeoning new technology for mining knowledge from data, a technology that a lot of people are starting to take seriously. 1.1 Data mining and machine learning We are overwhelmed with data. The amount of data in the world, in our lives, seems to go on and on increasing—and there’s no end in sight. Omnipresent personal computers make it too easy to save things that previously we would have trashed. Inexpensive multigigabyte disks make it too easy to postpone deci- sions about what to do with all this stuff—we simply buy another disk and keep it all. Ubiquitous electronics record our decisions, our choices in the super- market, our financial habits, our comings and goings.We swipe our way through the world, every swipe a record in a database. The World Wide Web overwhelms us with information; meanwhile, every choice we make is recorded.And all these are just personal choices: they have countless counterparts in the world of com- merce and industry. We would all testify to the growing gap between the gener- ation of data and our understanding of it. As the volume of data increases, inexorably, the proportion of it that people understand decreases, alarmingly. Lying hidden in all this data is information, potentially useful information, that is rarely made explicit or taken advantage of. This book is about looking for patterns in data. There is nothing new about this. People have been seeking patterns in data since human life began. Hunters seek patterns in animal migration behavior, farmers seek patterns in crop growth, politicians seek patterns in voter opinion, and lovers seek patterns in their partners’ responses. A scientist’s job (like a baby’s) is to make sense of data, to discover the patterns that govern how the physical world works and encap- sulate them in theories that can be used for predicting what will happen in new situations. The entrepreneur’s job is to identify opportunities, that is, patterns in behavior that can be turned into a profitable business, and exploit them. In data mining, the data is stored electronically and the search is automated— or at least augmented—by computer. Even this is not particularly new. Econo- mists, statisticians, forecasters, and communication engineers have long worked 4 CHAPTER 1 | WHAT’S IT ALL ABOUT?
  • 38. with the idea that patterns in data can be sought automatically, identified, validated, and used for prediction. What is new is the staggering increase in opportunities for finding patterns in data. The unbridled growth of databases in recent years, databases on such everyday activities as customer choices, brings data mining to the forefront of new business technologies. It has been estimated that the amount of data stored in the world’s databases doubles every 20 months, and although it would surely be difficult to justify this figure in any quantitative sense, we can all relate to the pace of growth qualitatively. As the flood of data swells and machines that can undertake the searching become commonplace, the opportunities for data mining increase. As the world grows in complexity, overwhelming us with the data it generates, data mining becomes our only hope for elucidating the patterns that underlie it. Intelligently analyzed data is a valuable resource. It can lead to new insights and, in commercial set- tings, to competitive advantages. Data mining is about solving problems by analyzing data already present in databases. Suppose, to take a well-worn example, the problem is fickle customer loyalty in a highly competitive marketplace. A database of customer choices, along with customer profiles, holds the key to this problem. Patterns of behavior of former customers can be analyzed to identify distinguishing charac- teristics of those likely to switch products and those likely to remain loyal. Once such characteristics are found, they can be put to work to identify present cus- tomers who are likely to jump ship. This group can be targeted for special treat- ment, treatment too costly to apply to the customer base as a whole. More positively, the same techniques can be used to identify customers who might be attracted to another service the enterprise provides, one they are not presently enjoying, to target them for special offers that promote this service. In today’s highly competitive, customer-centered, service-oriented economy, data is the raw material that fuels business growth—if only it can be mined. Data mining is defined as the process of discovering patterns in data. The process must be automatic or (more usually) semiautomatic. The patterns discovered must be meaningful in that they lead to some advantage, usually an economic advantage. The data is invariably present in substantial quantities. How are the patterns expressed? Useful patterns allow us to make nontrivial predictions on new data. There are two extremes for the expression of a pattern: as a black box whose innards are effectively incomprehensible and as a trans- parent box whose construction reveals the structure of the pattern. Both, we are assuming, make good predictions. The difference is whether or not the patterns that are mined are represented in terms of a structure that can be examined, reasoned about, and used to inform future decisions. Such patterns we call struc- tural because they capture the decision structure in an explicit way. In other words, they help to explain something about the data. 1.1 DATA MINING AND MACHINE LEARNING 5
  • 39. Now, finally, we can say what this book is about. It is about techniques for finding and describing structural patterns in data. Most of the techniques that we cover have developed within a field known as machine learning. But first let us look at what structural patterns are. Describing structural patterns What is meant by structural patterns? How do you describe them? And what form does the input take? We will answer these questions by way of illustration rather than by attempting formal, and ultimately sterile, definitions. There will be plenty of examples later in this chapter, but let’s examine one right now to get a feeling for what we’re talking about. Look at the contact lens data in Table 1.1. This gives the conditions under which an optician might want to prescribe soft contact lenses, hard contact lenses, or no contact lenses at all; we will say more about what the individual 6 CHAPTER 1 | WHAT’S IT ALL ABOUT? Table 1.1 The contact lens data. Spectacle Tear production Recommended Age prescription Astigmatism rate lenses young myope no reduced none young myope no normal soft young myope yes reduced none young myope yes normal hard young hypermetrope no reduced none young hypermetrope no normal soft young hypermetrope yes reduced none young hypermetrope yes normal hard pre-presbyopic myope no reduced none pre-presbyopic myope no normal soft pre-presbyopic myope yes reduced none pre-presbyopic myope yes normal hard pre-presbyopic hypermetrope no reduced none pre-presbyopic hypermetrope no normal soft pre-presbyopic hypermetrope yes reduced none pre-presbyopic hypermetrope yes normal none presbyopic myope no reduced none presbyopic myope no normal none presbyopic myope yes reduced none presbyopic myope yes normal hard presbyopic hypermetrope no reduced none presbyopic hypermetrope no normal soft presbyopic hypermetrope yes reduced none presbyopic hypermetrope yes normal none
  • 40. features mean later. Each line of the table is one of the examples. Part of a struc- tural description of this information might be as follows: If tear production rate = reduced then recommendation = none Otherwise, if age = young and astigmatic = no then recommendation = soft Structural descriptions need not necessarily be couched as rules such as these. Decision trees, which specify the sequences of decisions that need to be made and the resulting recommendation, are another popular means of expression. This example is a very simplistic one. First, all combinations of possible values are represented in the table. There are 24 rows, representing three possi- ble values of age and two values each for spectacle prescription, astigmatism, and tear production rate (3 ¥ 2 ¥ 2 ¥ 2 = 24). The rules do not really general- ize from the data; they merely summarize it. In most learning situations, the set of examples given as input is far from complete, and part of the job is to gen- eralize to other, new examples. You can imagine omitting some of the rows in the table for which tear production rate is reduced and still coming up with the rule If tear production rate = reduced then recommendation = none which would generalize to the missing rows and fill them in correctly. Second, values are specified for all the features in all the examples. Real-life datasets invariably contain examples in which the values of some features, for some reason or other, are unknown—for example, measurements were not taken or were lost. Third, the preceding rules classify the examples correctly, whereas often, because of errors or noise in the data, misclassifications occur even on the data that is used to train the classifier. Machine learning Now that we have some idea about the inputs and outputs, let’s turn to machine learning. What is learning, anyway? What is machine learning? These are philo- sophic questions, and we will not be much concerned with philosophy in this book; our emphasis is firmly on the practical. However, it is worth spending a few moments at the outset on fundamental issues, just to see how tricky they are, before rolling up our sleeves and looking at machine learning in practice. Our dictionary defines “to learn” as follows: To get knowledge of by study, experience, or being taught; To become aware by information or from observation; To commit to memory; To be informed of, ascertain; To receive instruction. 1.1 DATA MINING AND MACHINE LEARNING 7
  • 41. These meanings have some shortcomings when it comes to talking about com- puters. For the first two, it is virtually impossible to test whether learning has been achieved or not. How do you know whether a machine has got knowledge of something? You probably can’t just ask it questions; even if you could, you wouldn’t be testing its ability to learn but would be testing its ability to answer questions. How do you know whether it has become aware of something? The whole question of whether computers can be aware, or conscious, is a burning philosophic issue. As for the last three meanings, although we can see what they denote in human terms, merely “committing to memory” and “receiving instruction” seem to fall far short of what we might mean by machine learning. They are too passive, and we know that computers find these tasks trivial. Instead, we are interested in improvements in performance, or at least in the potential for performance, in new situations. You can “commit something to memory” or “be informed of something” by rote learning without being able to apply the new knowledge to new situations. You can receive instruction without benefiting from it at all. Earlier we defined data mining operationally as the process of discovering patterns, automatically or semiautomatically, in large quantities of data—and the patterns must be useful. An operational definition can be formulated in the same way for learning: Things learn when they change their behavior in a way that makes them perform better in the future. This ties learning to performance rather than knowledge. You can test learning by observing the behavior and comparing it with past behavior. This is a much more objective kind of definition and appears to be far more satisfactory. But there’s still a problem. Learning is a rather slippery concept. Lots of things change their behavior in ways that make them perform better in the future, yet we wouldn’t want to say that they have actually learned. A good example is a comfortable slipper. Has it learned the shape of your foot? It has certainly changed its behavior to make it perform better as a slipper! Yet we would hardly want to call this learning. In everyday language, we often use the word “train- ing” to denote a mindless kind of learning. We train animals and even plants, although it would be stretching the word a bit to talk of training objects such as slippers that are not in any sense alive. But learning is different. Learning implies thinking. Learning implies purpose. Something that learns has to do so intentionally. That is why we wouldn’t say that a vine has learned to grow round a trellis in a vineyard—we’d say it has been trained. Learning without purpose is merely training. Or, more to the point, in learning the purpose is the learner’s, whereas in training it is the teacher’s. Thus on closer examination the second definition of learning, in operational, performance-oriented terms,has its own problems when it comes to talking about 8 CHAPTER 1 | WHAT’S IT ALL ABOUT?
  • 42. computers. To decide whether something has actually learned, you need to see whether it intended to or whether there was any purpose involved. That makes the concept moot when applied to machines because whether artifacts can behave purposefully is unclear. Philosophic discussions of what is really meant by “learn- ing,” like discussions of what is really meant by “intention” or “purpose,” are fraught with difficulty. Even courts of law find intention hard to grapple with. Data mining Fortunately, the kind of learning techniques explained in this book do not present these conceptual problems—they are called machine learning without really presupposing any particular philosophic stance about what learning actu- ally is. Data mining is a practical topic and involves learning in a practical, not a theoretical, sense. We are interested in techniques for finding and describing structural patterns in data as a tool for helping to explain that data and make predictions from it. The data will take the form of a set of examples—examples of customers who have switched loyalties, for instance, or situations in which certain kinds of contact lenses can be prescribed. The output takes the form of predictions about new examples—a prediction of whether a particular customer will switch or a prediction of what kind of lens will be prescribed under given circumstances. But because this book is about finding and describing patterns in data, the output may also include an actual description of a structure that can be used to classify unknown examples to explain the decision. As well as performance, it is helpful to supply an explicit representation of the knowledge that is acquired. In essence, this reflects both definitions of learning considered previously: the acquisition of knowledge and the ability to use it. Many learning techniques look for structural descriptions of what is learned, descriptions that can become fairly complex and are typically expressed as sets of rules such as the ones described previously or the decision trees described later in this chapter. Because they can be understood by people, these descrip- tions serve to explain what has been learned and explain the basis for new pre- dictions. Experience shows that in many applications of machine learning to data mining, the explicit knowledge structures that are acquired, the structural descriptions, are at least as important, and often very much more important, than the ability to perform well on new examples. People frequently use data mining to gain knowledge, not just predictions. Gaining knowledge from data certainly sounds like a good idea if you can do it. To find out how, read on! 1.2 Simple examples: The weather problem and others We use a lot of examples in this book, which seems particularly appropriate con- sidering that the book is all about learning from examples! There are several 1.2 SIMPLE EXAMPLES: THE WEATHER PROBLEM AND OTHERS 9
  • 43. standard datasets that we will come back to repeatedly. Different datasets tend to expose new issues and challenges, and it is interesting and instructive to have in mind a variety of problems when considering learning methods. In fact, the need to work with different datasets is so important that a corpus containing around 100 example problems has been gathered together so that different algo- rithms can be tested and compared on the same set of problems. The illustrations in this section are all unrealistically simple. Serious appli- cation of data mining involves thousands, hundreds of thousands, or even mil- lions of individual cases. But when explaining what algorithms do and how they work, we need simple examples that capture the essence of the problem but are small enough to be comprehensible in every detail. We will be working with the illustrations in this section throughout the book, and they are intended to be “academic” in the sense that they will help us to understand what is going on. Some actual fielded applications of learning techniques are discussed in Section 1.3, and many more are covered in the books mentioned in the Further reading section at the end of the chapter. Another problem with actual real-life datasets is that they are often propri- etary. No one is going to share their customer and product choice database with you so that you can understand the details of their data mining application and how it works. Corporate data is a valuable asset, one whose value has increased enormously with the development of data mining techniques such as those described in this book. Yet we are concerned here with understanding how the methods used for data mining work and understanding the details of these methods so that we can trace their operation on actual data. That is why our illustrations are simple ones. But they are not simplistic: they exhibit the fea- tures of real datasets. The weather problem The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the con- ditions that are suitable for playing some unspecified game. In general, instances in a dataset are characterized by the values of features, or attributes, that measure different aspects of the instance. In this case there are four attributes: outlook, temperature, humidity, and windy. The outcome is whether to play or not. In its simplest form, shown in Table 1.2, all four attributes have values that are symbolic categories rather than numbers. Outlook can be sunny, overcast, or rainy; temperature can be hot, mild, or cool; humidity can be high or normal; and windy can be true or false. This creates 36 possible combinations (3 ¥ 3 ¥ 2 ¥ 2 = 36), of which 14 are present in the set of input examples. A set of rules learned from this information—not necessarily a very good one—might look as follows: 10 CHAPTER 1 | WHAT’S IT ALL ABOUT?
  • 44. If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity = normal then play = yes If none of the above then play = yes These rules are meant to be interpreted in order: the first one, then if it doesn’t apply the second, and so on. A set of rules that are intended to be interpreted in sequence is called a decision list. Interpreted as a decision list, the rules correctly classify all of the examples in the table, whereas taken individually, out of context, some of the rules are incorrect. For example, the rule if humidity = normal then play = yes gets one of the examples wrong (check which one). The meaning of a set of rules depends on how it is interpreted—not surprisingly! In the slightly more complex form shown in Table 1.3, two of the attributes— temperature and humidity—have numeric values. This means that any learn- ing method must create inequalities involving these attributes rather than simple equality tests, as in the former case. This is called a numeric-attribute problem—in this case, a mixed-attribute problem because not all attributes are numeric. Now the first rule given earlier might take the following form: If outlook = sunny and humidity > 83 then play = no A slightly more complex process is required to come up with rules that involve numeric tests. 1.2 SIMPLE EXAMPLES: THE WEATHER PROBLEM AND OTHERS 11 Table 1.2 The weather data. Outlook Temperature Humidity Windy Play sunny hot high false no sunny hot high true no overcast hot high false yes rainy mild high false yes rainy cool normal false yes rainy cool normal true no overcast cool normal true yes sunny mild high false no sunny cool normal false yes rainy mild normal false yes sunny mild normal true yes overcast mild high true yes overcast hot normal false yes rainy mild high true no
  • 45. The rules we have seen so far are classification rules: they predict the classifi- cation of the example in terms of whether to play or not. It is equally possible to disregard the classification and just look for any rules that strongly associate different attribute values. These are called association rules. Many association rules can be derived from the weather data in Table 1.2. Some good ones are as follows: If temperature = cool then humidity = normal If humidity = normal and windy = false then play = yes If outlook = sunny and play = no then humidity = high If windy = false and play = no then outlook = sunny and humidity = high. All these rules are 100% correct on the given data; they make no false predic- tions. The first two apply to four examples in the dataset, the third to three examples, and the fourth to two examples. There are many other rules: in fact, nearly 60 association rules can be found that apply to two or more examples of the weather data and are completely correct on this data. If you look for rules that are less than 100% correct, then you will find many more. There are so many because unlike classification rules, association rules can “predict” any of the attributes, not just a specified class, and can even predict more than one thing. For example, the fourth rule predicts both that outlook will be sunny and that humidity will be high. 12 CHAPTER 1 | WHAT’S IT ALL ABOUT? Table 1.3 Weather data with some numeric attributes. Outlook Temperature Humidity Windy Play sunny 85 85 false no sunny 80 90 true no overcast 83 86 false yes rainy 70 96 false yes rainy 68 80 false yes rainy 65 70 true no overcast 64 65 true yes sunny 72 95 false no sunny 69 70 false yes rainy 75 80 false yes sunny 75 70 true yes overcast 72 90 true yes overcast 81 75 false yes rainy 71 91 true no
  • 46. Contact lenses: An idealized problem The contact lens data introduced earlier tells you the kind of contact lens to pre- scribe, given certain information about a patient. Note that this example is intended for illustration only: it grossly oversimplifies the problem and should certainly not be used for diagnostic purposes! The first column of Table 1.1 gives the age of the patient. In case you’re won- dering, presbyopia is a form of longsightedness that accompanies the onset of middle age. The second gives the spectacle prescription: myope means short- sighted and hypermetrope means longsighted. The third shows whether the patient is astigmatic, and the fourth relates to the rate of tear production, which is important in this context because tears lubricate contact lenses. The final column shows which kind of lenses to prescribe: hard, soft, or none. All possi- ble combinations of the attribute values are represented in the table. A sample set of rules learned from this information is shown in Figure 1.1. This is a rather large set of rules, but they do correctly classify all the examples. These rules are complete and deterministic: they give a unique prescription for every conceivable example. Generally, this is not the case. Sometimes there are situations in which no rule applies; other times more than one rule may apply, resulting in conflicting recommendations. Sometimes probabilities or weights 1.2 SIMPLE EXAMPLES: THE WEATHER PROBLEM AND OTHERS 13 If tear production rate = reduced then recommendation = none If age = young and astigmatic = no and tear production rate = normal then recommendation = soft If age = pre-presbyopic and astigmatic = no and tear production rate = normal then recommendation = soft If age = presbyopic and spectacle prescription = myope and astigmatic = no then recommendation = none If spectacle prescription = hypermetrope and astigmatic = no and tear production rate = normal then recommendation = soft If spectacle prescription = myope and astigmatic = yes and tear production rate = normal then recommendation = hard If age = young and astigmatic = yes and tear production rate = normal then recommendation = hard If age = pre-presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none If age = presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none Figure 1.1 Rules for the contact lens data.
  • 47. may be associated with the rules themselves to indicate that some are more important, or more reliable, than others. You might be wondering whether there is a smaller rule set that performs as well. If so, would you be better off using the smaller rule set and, if so, why? These are exactly the kinds of questions that will occupy us in this book. Because the examples form a complete set for the problem space, the rules do no more than summarize all the information that is given, expressing it in a different and more concise way. Even though it involves no generalization, this is often a very useful thing to do! People frequently use machine learning techniques to gain insight into the structure of their data rather than to make predictions for new cases. In fact, a prominent and successful line of research in machine learning began as an attempt to compress a huge database of possible chess endgames and their outcomes into a data structure of reasonable size. The data structure chosen for this enterprise was not a set of rules but a decision tree. Figure 1.2 shows a structural description for the contact lens data in the form of a decision tree, which for many purposes is a more concise and perspicuous representation of the rules and has the advantage that it can be visualized more easily. (However, this decision tree—in contrast to the rule set given in Figure 1.1—classifies two examples incorrectly.) The tree calls first for a test on tear production rate, and the first two branches correspond to the two possible out- comes. If tear production rate is reduced (the left branch), the outcome is none. If it is normal (the right branch), a second test is made, this time on astigma- tism. Eventually, whatever the outcome of the tests, a leaf of the tree is reached 14 CHAPTER 1 | WHAT’S IT ALL ABOUT? normal tear production rate reduced hypermetrope myope none astigmatism soft hard none spectacle prescription yes no Figure 1.2 Decision tree for the contact lens data.
  • 48. that dictates the contact lens recommendation for that case. The question of what is the most natural and easily understood format for the output from a machine learning scheme is one that we will return to in Chapter 3. Irises: A classic numeric dataset The iris dataset, which dates back to seminal work by the eminent statistician R.A. Fisher in the mid-1930s and is arguably the most famous dataset used in data mining, contains 50 examples each of three types of plant: Iris setosa, Iris versicolor, and Iris virginica. It is excerpted in Table 1.4. There are four attrib- utes: sepal length, sepal width, petal length, and petal width (all measured in cen- timeters). Unlike previous datasets, all attributes have values that are numeric. The following set of rules might be learned from this dataset: If petal length < 2.45 then Iris setosa If sepal width < 2.10 then Iris versicolor If sepal width < 2.45 and petal length < 4.55 then Iris versicolor If sepal width < 2.95 and petal width < 1.35 then Iris versicolor If petal length ≥ 2.45 and petal length < 4.45 then Iris versicolor If sepal length ≥ 5.85 and petal length < 4.75 then Iris versicolor 1.2 SIMPLE EXAMPLES: THE WEATHER PROBLEM AND OTHERS 15 Table 1.4 The iris data. Sepal Sepal width Petal length Petal width length (cm) (cm) (cm) (cm) Type 1 5.1 3.5 1.4 0.2 Iris setosa 2 4.9 3.0 1.4 0.2 Iris setosa 3 4.7 3.2 1.3 0.2 Iris setosa 4 4.6 3.1 1.5 0.2 Iris setosa 5 5.0 3.6 1.4 0.2 Iris setosa . . . 51 7.0 3.2 4.7 1.4 Iris versicolor 52 6.4 3.2 4.5 1.5 Iris versicolor 53 6.9 3.1 4.9 1.5 Iris versicolor 54 5.5 2.3 4.0 1.3 Iris versicolor 55 6.5 2.8 4.6 1.5 Iris versicolor . . . 101 6.3 3.3 6.0 2.5 Iris virginica 102 5.8 2.7 5.1 1.9 Iris virginica 103 7.1 3.0 5.9 2.1 Iris virginica 104 6.3 2.9 5.6 1.8 Iris virginica 105 6.5 3.0 5.8 2.2 Iris virginica . . .
  • 49. If sepal width < 2.55 and petal length < 4.95 and petal width < 1.55 then Iris versicolor If petal length ≥ 2.45 and petal length < 4.95 and petal width < 1.55 then Iris versicolor If sepal length ≥ 6.55 and petal length < 5.05 then Iris versicolor If sepal width < 2.75 and petal width < 1.65 and sepal length < 6.05 then Iris versicolor If sepal length ≥ 5.85 and sepal length < 5.95 and petal length < 4.85 then Iris versicolor If petal length ≥ 5.15 then Iris virginica If petal width ≥ 1.85 then Iris virginica If petal width ≥ 1.75 and sepal width < 3.05 then Iris virginica If petal length ≥ 4.95 and petal width < 1.55 then Iris virginica These rules are very cumbersome, and we will see in Chapter 3 how more compact rules can be expressed that convey the same information. CPU performance: Introducing numeric prediction Although the iris dataset involves numeric attributes, the outcome—the type of iris—is a category, not a numeric value. Table 1.5 shows some data for which the outcome and the attributes are numeric. It concerns the relative perform- ance of computer processing power on the basis of a number of relevant attributes; each row represents 1 of 209 different computer configurations. The classic way of dealing with continuous prediction is to write the outcome as a linear sum of the attribute values with appropriate weights, for example: 16 CHAPTER 1 | WHAT’S IT ALL ABOUT? Table 1.5 The CPU performance data. Main Cycle memory (KB) Cache Channels time (ns) Min. Max. (KB) Min. Max. Performance MYCT MMIN MMAX CACH CHMIN CHMAX PRP 1 125 256 6000 256 16 128 198 2 29 8000 32000 32 8 32 269 3 29 8000 32000 32 8 32 220 4 29 8000 32000 32 8 32 172 5 29 8000 16000 32 8 16 132 . . . 207 125 2000 8000 0 2 14 52 208 480 512 8000 32 0 0 67 209 480 1000 4000 0 0 0 45
  • 50. PRP = -55.9 + 0.0489 MYCT + 0.0153 MMIN + 0.0056 MMAX + 0.6410 CACH - 0.2700 CHMIN + 1.480 CHMAX. (The abbreviated variable names are given in the second row of the table.) This is called a regression equation, and the process of determining the weights is called regression, a well-known procedure in statistics that we will review in Chapter 4. However, the basic regression method is incapable of discovering nonlinear relationships (although variants do exist—indeed, one will be described in Section 6.3), and in Chapter 3 we will examine different represen- tations that can be used for predicting numeric quantities. In the iris and central processing unit (CPU) performance data, all the attributes have numeric values. Practical situations frequently present a mixture of numeric and nonnumeric attributes. Labor negotiations: A more realistic example The labor negotiations dataset in Table 1.6 summarizes the outcome of Cana- dian contract negotiations in 1987 and 1988. It includes all collective agreements reached in the business and personal services sector for organizations with at least 500 members (teachers, nurses, university staff, police, etc.). Each case con- cerns one contract, and the outcome is whether the contract is deemed accept- able or unacceptable. The acceptable contracts are ones in which agreements were accepted by both labor and management. The unacceptable ones are either known offers that fell through because one party would not accept them or acceptable contracts that had been significantly perturbed to the extent that, in the view of experts, they would not have been accepted. There are 40 examples in the dataset (plus another 17 which are normally reserved for test purposes). Unlike the other tables here, Table 1.6 presents the examples as columns rather than as rows; otherwise, it would have to be stretched over several pages. Many of the values are unknown or missing, as indicated by question marks. This is a much more realistic dataset than the others we have seen. It con- tains many missing values, and it seems unlikely that an exact classification can be obtained. Figure 1.3 shows two decision trees that represent the dataset. Figure 1.3(a) is simple and approximate: it doesn’t represent the data exactly. For example, it will predict bad for some contracts that are actually marked good. But it does make intuitive sense: a contract is bad (for the employee!) if the wage increase in the first year is too small (less than 2.5%). If the first-year wage increase is larger than this, it is good if there are lots of statutory holidays (more than 10 days). Even if there are fewer statutory holidays, it is good if the first-year wage increase is large enough (more than 4%). 1.2 SIMPLE EXAMPLES: THE WEATHER PROBLEM AND OTHERS 17
  • 51. Figure 1.3(b) is a more complex decision tree that represents the same dataset. In fact, this is a more accurate representation of the actual dataset that was used to create the tree. But it is not necessarily a more accurate representa- tion of the underlying concept of good versus bad contracts. Look down the left branch. It doesn’t seem to make sense intuitively that, if the working hours exceed 36, a contract is bad if there is no health-plan contribution or a full health-plan contribution but is good if there is a half health-plan contribution. It is certainly reasonable that the health-plan contribution plays a role in the decision but not if half is good and both full and none are bad. It seems likely that this is an artifact of the particular values used to create the decision tree rather than a genuine feature of the good versus bad distinction. The tree in Figure 1.3(b) is more accurate on the data that was used to train the classifier but will probably perform less well on an independent set of test data. It is “overfitted” to the training data—it follows it too slavishly. The tree in Figure 1.3(a) is obtained from the one in Figure 1.3(b) by a process of pruning, which we will learn more about in Chapter 6. Soybean classification: A classic machine learning success An often-quoted early success story in the application of machine learning to practical problems is the identification of rules for diagnosing soybean diseases. The data is taken from questionnaires describing plant diseases. There are about 18 CHAPTER 1 | WHAT’S IT ALL ABOUT? Table 1.6 The labor negotiations data. Attribute Type 1 2 3 . . . 40 duration years 1 2 3 2 wage increase 1st year percentage 2% 4% 4.3% 4.5 wage increase 2nd year percentage ? 5% 4.4% 4.0 wage increase 3rd year percentage ? ? ? ? cost of living adjustment {none, tcf, tc} none tcf ? none working hours per week hours 28 35 38 40 pension {none, ret-allw, empl-cntr} none ? ? ? standby pay percentage ? 13% ? ? shift-work supplement percentage ? 5% 4% 4 education allowance {yes, no} yes ? ? ? statutory holidays days 11 15 12 12 vacation {below-avg, avg, gen} avg gen gen avg long-term disability assistance {yes, no} no ? ? yes dental plan contribution {none, half, full} none ? full full bereavement assistance {yes, no} no ? ? yes health plan contribution {none, half, full} none ? full half acceptability of contract {good, bad} bad good good good
  • 52. 1.2 SIMPLE EXAMPLES: THE WEATHER PROBLEM AND OTHERS 19 wage increase first year bad ≤ 2.5 statutory holidays > 2.5 good > 10 wage increase first year ≤ 10 bad ≤ 4 good > 4 (a) ≤ 2.5 statutory holidays > 2.5 bad ≤ 36 health plan contribution > 36 good > 10 wage increase first year wage increase first year working hours per week ≤ 10 bad none good half bad full bad ≤ 4 good > 4 (b) Figure 1.3 Decision trees for the labor negotiations data.
  • 53. 680 examples, each representing a diseased plant. Plants were measured on 35 attributes, each one having a small set of possible values. Examples are labeled with the diagnosis of an expert in plant biology: there are 19 disease categories altogether—horrible-sounding diseases such as diaporthe stem canker, rhizoc- tonia root rot, and bacterial blight, to mention just a few. Table 1.7 gives the attributes, the number of different values that each can have, and a sample record for one particular plant. The attributes are placed into different categories just to make them easier to read. Here are two example rules, learned from this data: If [leaf condition is normal and stem condition is abnormal and stem cankers is below soil line and canker lesion color is brown] then diagnosis is rhizoctonia root rot If [leaf malformation is absent and stem condition is abnormal and stem cankers is below soil line and canker lesion color is brown] then diagnosis is rhizoctonia root rot These rules nicely illustrate the potential role of prior knowledge—often called domain knowledge—in machine learning, because the only difference between the two descriptions is leaf condition is normal versus leaf malformation is absent. Now, in this domain, if the leaf condition is normal then leaf malfor- mation is necessarily absent, so one of these conditions happens to be a special case of the other. Thus if the first rule is true, the second is necessarily true as well. The only time the second rule comes into play is when leaf malformation is absent but leaf condition is not normal, that is, when something other than malformation is wrong with the leaf. This is certainly not apparent from a casual reading of the rules. Research on this problem in the late 1970s found that these diagnostic rules could be generated by a machine learning algorithm, along with rules for every other disease category, from about 300 training examples. These training examples were carefully selected from the corpus of cases as being quite differ- ent from one another—“far apart” in the example space. At the same time, the plant pathologist who had produced the diagnoses was interviewed, and his expertise was translated into diagnostic rules. Surprisingly, the computer- generated rules outperformed the expert-derived rules on the remaining test examples. They gave the correct disease top ranking 97.5% of the time com- pared with only 72% for the expert-derived rules. Furthermore, not only did 20 CHAPTER 1 | WHAT’S IT ALL ABOUT?
  • 54. 1.2 SIMPLE EXAMPLES: THE WEATHER PROBLEM AND OTHERS 21 Table 1.7 The soybean data. Number Attribute of values Sample value Environment time of occurrence 7 July precipitation 3 above normal temperature 3 normal cropping history 4 same as last year hail damage 2 yes damaged area 4 scattered severity 3 severe plant height 2 normal plant growth 2 abnormal seed treatment 3 fungicide germination 3 less than 80% Seed condition 2 normal mold growth 2 absent discoloration 2 absent size 2 normal shriveling 2 absent Fruit condition of fruit pods 3 normal fruit spots 5 — Leaf condition 2 abnormal leaf spot size 3 — yellow leaf spot halo 3 absent leaf spot margins 3 — shredding 2 absent leaf malformation 2 absent leaf mildew growth 3 absent Stem condition 2 abnormal stem lodging 2 yes stem cankers 4 above soil line canker lesion color 3 — fruiting bodies on stems 2 present external decay of stem 3 firm and dry mycelium on stem 2 absent internal discoloration 3 none sclerotia 2 absent Root condition 3 normal Diagnosis diaporthe stem 19 canker