This document discusses best practices for evaluating supervised machine learning models, including: 1) The importance of splitting data into training and testing sets to avoid "memorizing" the data and get an accurate performance measure. 2) Common dataset splitting methods like linear and random splits. 3) The importance of metrics like accuracy, and how they can be misleading, especially for imbalanced datasets. 4) How different domains may value reducing different types of mistakes, like preferring fewer false negatives for medical diagnosis.