This book provides a highly accessible introduction to machine learning but also caters for readers who want to delve into the more mathematical techniques available in modern probabilistic modeling and deep learning approaches. It is complemented by an online appendix on the Weka software, an extended version of a brief description of Weka included as an appendix in the book. A table of contents for the fourth edition, indicating new material since the 3rd edition, can be found further down this page.


"If you have data that you want to analyze and understand, this book and the associated Weka toolkit are an excellent way to start."

-Jim Gray, Microsoft Research

"The authors provide enough theory to enable practical application, and it is this practical focus that separates this book from most, if not all, other books on this subject."

-Dorian Pyle, Director of Modeling at Numetrics

"This book would be a strong contender for a technical data mining course. It is one of the best of its kind."

-Herb Edelstein, Principal, Data Mining Consultant, Two Crows Consulting

"It is certainly one of my favourite data mining books in my library."

-Tom Breur, Principal, XLNT Consulting, Tilburg, Netherlands

  • Explains how machine learning algorithms for data mining work.
  • Helps you compare and evaluate the results of different techniques.
  • Covers performance improvement techniques, including input preprocessing and combining output from different methods.
  • Features in-depth information on probabilistic models and deep learning.
  • Provides an introduction to the Weka machine learning workbench and links to algorithm implementations in the software.

The book has been translated into German (first edition), Chinese (second, third, and fourth edition) and Korean (third and fourth edition).

Online appendix

Click here to download the online appendix on Weka, an extended version of Appendix B in the book.


Click here to get to a list of errata.



Reviews of the first edition

Review by J. Geller (SIGMOD Record, Vol. 31:1, March 2002).
Review by E. Davis (AI Journal, Vol. 131:1-2, September 2001).
Review by P.A. Flach (AI Journal, Vol. 131:1-2, September 2001).

Table of Contents of the 4th Edition:
Sections and chapters with new material are marked in red.


1. What’s it all about?
1.1 Data Mining and Machine Learning
1.2 Simple Examples: The Weather Problem and Others
1.3 Fielded Applications
1.4 The Data Mining Process
1.5 Machine Learning and Statistics
1.6 Generalization as Search
1.7 Data Mining and Ethics
1.8 Further Reading and Bibliographic Notes

2. Input: concepts, instances, attributes
2.1 What’s a Concept?
2.2 What’s in an Example?
2.3 What’s in an Attribute?
2.4 Preparing the Input
2.5 Further Reading and Bibliographic Notes

3. Output: Knowledge representation
3.1 Tables
3.2 Linear Models
3.3 Trees
3.4 Rules
3.5 Instance-Based Representation
3.6 Clusters
3.7 Further Reading and Bibliographic Notes

4. Algorithms: the basic methods
4.1 Inferring Rudimentary Rules
4.2 Simple Probabilistic Modeling
4.3 Divide-and-Conquer: Constructing Decision Trees
4.4 Covering Algorithms: Constructing Rules
4.5 Mining Association Rules
4.6 Linear Models
4.7 Instance-Based Learning
4.8 Clustering
4.9 Multi-Instance Learning
4.10 Further Reading and Bibliographic Notes
4.11 WEKA Implementations

5. Credibility: Evaluating what’s been learned
5.1 Training and Testing
5.2 Predicting Performance
5.3 Cross-Validation
5.4 Other Estimates
5.5 Hyperparameter Selection
5.6 Comparing Data Mining Schemes
5.7 Predicting Probabilities
5.8 Counting the Cost
5.9 Evaluating Numeric Prediction
5.10 The Minimum Description Length Principle
5.11 Applying MDL to Clustering
5.12 Using a Validation Set for Model Selection
5.13 Further Reading and Bibliographic Notes

6. Trees and rules
6.1 Decision Trees
6.2 Classification Rules
6.3 Association Rules
6.4 WEKA Implementations

7. Extending instance-based and linear models
7.1 Instance-Based Learning
7.2 Extending Linear Models
7.3 Numeric Prediction with Local Linear Models
7.4 WEKA Implementations

8. Data transformations
8.1 Attribute Selection
8.2 Discretizing Numeric Attributes
8.3 Projections
8.4 Sampling
8.5 Cleansing
8.6 Transforming Multiple Classes to Binary Ones
8.7 Calibrating Class Probabilities
8.8 Further Reading and Biblographic Notes
8.9 WEKA Implementations

9. Probabilistic methods
9.1 Foundations
9.2 Bayesian Networks
9.3 Clustering and Probability Density Estimation
9.4 Hidden Variable Models
9.5 Bayesian Estimation and Prediction
9.6 Graphical Models and Factor Graphs
9.7 Conditional Probability Models
9.8 Sequential and Temporal Models
9.9 Further Reading and Bibliographic Notes
9.10 WEKA Implementations

10. Deep learning
10.1 Deep Feedforward Networks
10.2 Training and Evaluating Deep Networks
10.3 Convolutional Neural Networks
10.4 Autoencoders
10.5 Stochastic Deep Networks
10.6 Recurrent Neural Networks
10.7 Further Reading and Bibliographic Notes
10.8 Deep Learning Software and Network Implementations
10.9 WEKA implementations

11. Beyond supervised and unsupervised learning
11.1 Semi-supervised learning
11.2 Multi-instance Learning
11.3 Further Reading and Bibliographic Notes
11.4 WEKA Implementations

12. Ensemble Learning
12.1 Combining Multiple Models
12.2 Bagging
12.3 Randomization
12.4 Boosting
12.5 Additive Regression
12.6 Interpretable Ensembles
12.7 Stacking
12.8 Further Reading and Bibliographic Notes
12.9 WEKA Implementations

13. Moving on: Applications and Beyond
13.1 Applying Data Mining
13.2 Learning from Massive Datasets
13.3 Data Stream Learning
13.4 Incorporating Domain Knowledge
13.5 Text Mining
13.6 Web Mining
13.7 Images and Speech
13.8 Adversarial Situations
13.9 Ubiquitous Data Mining
13.10 Further Reading and Bibliographic Notes
13.11 WEKA Implementations

Appendix A: Theoretical foundations
Appendix B: The WEKA workbench