1) The document discusses several recommendation problems at Stitch Fix including match score, fix generation, style prediction, inventory health, and search. It outlines concerns with each problem including different loss functions, organizational barriers, and lack of joint training or validation.
2) It describes mistakes made such as type 1 errors from peeking, the "balkanization" of teams working independently, and humans being left out of the model evaluation process. Weak composition of models without joint training was also a challenge.
3) The document advocates for practices like global holdouts, published validation, random re-testing, and strengthening weak composition between models. It suggests institutionalizing internal task leaderboards and validation to improve experimental rigor.