The 5th edition of the data mining book.

About this book

An accessible introduction to machine learning that covers classic techiques for data mining as well as the latest deep learning approaches used in applications of artificial intelligence. The 5th edition features a new chapter on advanced deep learning, including approaches such as diffusion models and transformers, discusses modern data visualisation techniques, and, perhaps most importantly, has a new chapter on privacy, fairness, bias, and AI safety. Last but not least, due to popular demand, this edition features exercises for each chapter! A table of contents for the 5th edition that shows where new material has been added can be found further down this page. The book is complemented by an online appendix on the Weka software, an extended version of a brief appendix on Weka in the book.

Click here to order from Amazon.com or here to order from the publisher.

Highlights
  • Explains how machine learning algorithms for data mining work.
  • Helps you compare and evaluate the results of different techniques.
  • Covers performance improvement techniques, including input preprocessing and combining output from different methods.
  • Features in-depth information on probabilistic models and deep learning.
  • Provides an introduction to the Weka machine learning workbench and links to algorithm implementations in the software.
Translations

The book has been translated into German (first edition), Chinese (second, third, and fourth edition) and Korean (third and fourth edition).

Online appendix

Click here to download the online appendix on Weka, an extended version of Appendix B in the book.

Errata

Click here to get to a list of errata.

Reviews of the first edition

Review by J. Geller (SIGMOD Record, Vol. 31:1, March 2002).
Review by E. Davis (AI Journal, Vol. 131:1-2, September 2001).
Review by P.A. Flach (AI Journal, Vol. 131:1-2, September 2001).


Table of Contents of the 5th Edition:
Sections and chapters with new material are marked in red.

Preface

1. What’s it all about?
1.1 Data Mining and Machine Learning
1.2 Simple Examples: The Weather Problem and Others
1.3 Fielded Applications
1.4 The Data Mining Process
1.5 Machine Learning and Statistics
1.6 Generalization as Search
1.7 Data Mining and Ethics
1.8 Further Reading and Bibliographic Notes
1.9 Exercises

2. Input: concepts, instances, attributes
2.1 What’s a Concept?
2.2 What’s in an Example?
2.3 What’s in an Attribute?
2.4 Preparing the Input
2.5 Further Reading and Bibliographic Notes
2.6 Exercises

3. Output: Knowledge representation
3.1 Tables
3.2 Linear Models
3.3 Trees
3.4 Rules
3.5 Instance-Based Representation
3.6 Clusters
3.7 Further Reading and Bibliographic Notes
3.8 Exercises

4. Algorithms: the basic methods
4.1 Inferring Rudimentary Rules
4.2 Simple Probabilistic Modeling
4.3 Divide-and-Conquer: Constructing Decision Trees
4.4 Covering Algorithms: Constructing Rules
4.5 Mining Association Rules
4.6 Linear Models
4.7 Instance-Based Learning
4.8 Clustering
4.9 Multiinstance Learning
4.10 Further Reading and Bibliographic Notes
4.11 WEKA Implementations
3.8 Exercises

5. Credibility: Evaluating what’s been learned
5.1 Training and Testing
5.2 Predicting Performance
5.3 Cross-Validation
5.4 Other Estimates
5.5 Hyperparameter Selection
5.6 Comparing Data Mining Schemes
5.7 Predicting Probabilities
5.8 Counting the Cost
5.9 Evaluating Numeric Prediction
5.10 The MDL Principle
5.11 Applying the MDL Principle to Clustering
5.12 Using a Validation Set for Model Selection
5.13 Further Reading and Bibliographic Notes
5.14 Exercises

6. Preparation and exploratory data analysis
6.1 Attribute Selection
6.2 Discretizing Numeric Attributes
6.3 Projections
6.4 Sampling
6.5 Cleansing
6.6 Transforming Multiple Classes to Binary Ones
6.7 Calibrating Class Probabilities
6.8 Exploratory Data Analysis
6.9 Further Reading and Biblographic Notes
6.10 WEKA Implementations
6.11 Exercises

7. Ethics: what are the impacts of what’s been learned?
7.1 Privacy
7.2 Fairness and Bias in Machine Learning
7.3 AI Safety
7.4 Further Reading and Bibliographic Notes
7.5 Exercises

8. Ensemble Learning
8.1 Combining Multiple Models
8.2 Bagging
8.3 Randomization
8.4 Boosting
8.5 Additive Regression
8.6 Interpretable Ensembles
8.7 Stacking
8.8 Further Reading and Bibliographic Notes
8.9 WEKA Implementations
8.10 Exercises

9. Extending instance-based and linear models
9.1 Instance-Based Learning
9.2 Extending Linear Models
9.3 Numeric Prediction with Local Linear Models
9.4 WEKA Implementations
9.5 Exercises

10. Deep learning
10.1 Deep Feedforward Networks
10.2 Training and Evaluating Deep Networks
10.3 Convolutional Neural Networks
10.4 Autoencoders
10.5 Recurrent Neural Networks
10.6 Further Reading and Bibliographic Notes
10.7 Deep Learning Software and Network Implementations
10.9 WEKA implementations
10.10 Exercises

11. Advanced deep learning methods
11.1 Generative AI via Deep Learning
11.2 Introduction to Natural Language Processing and Large Language Models
11.3 Transformer Architecture
11.4 Transformer-Based Language Models
11.5 Adversarial Examples
11.6 Knowledge Distillation
11.7 Deep Reinforcement Learning
11.9 Further Reading and Bibliographic Notes
11.9 Exercises

12. Beyond supervised and unsupervised learning
12.1 Semi-supervised learning
11.2 Multi-instance Learning
12.3 Further Reading and Bibliographic Notes
12.4 WEKA Implementations
12.5 Exercises

13. Probabilistic methods: fundamentals
13.1 Foundations
13.2 Bayesian Networks
13.3 Clustering and Probability Density Estimation
13.4 Further Reading and Bibliographic Notes
13.5 Exercises

14. Advanced probabilistic methods
14.1 Hidden Variable Models
14.2 Bayesian Estimation and Prediction
14.3 Graphical Models and Factor Graphs
14.4 Conditional Probability Models
14.5 Sequential and Temporal Models
14.6 Further Reading and Bibliographic Notes
14.7 WEKA Implementations
14.8 Exercises

15. Moving on: Applications and Beyond
15.1 Applying Machine Learning
15.2 Learning from Massive Datasets
15.3 Data Stream Learning
15.4 Incorporating Domain Knowledge
15.5 Text Mining
15.6 Web Mining
15.7 Images and Speech
15.8 Adversarial Situations
15.9 Ubiquitous Data Mining
15.10 Machine Learning Technologies and Applications of Concern
15.11 AI and Society
15.12 Further Reading and Bibliographic Notes
15.13 WEKA Implementations
15.14 Exercises

Appendix A: Theoretical foundations
Appendix B: The WEKA workbench
Appendix C: Implementation details of trees and rules
Appendix D: Technical details of deep learning
References
Index