The document compares the results of offline and online evaluations of recommender systems for a small Czech travel agency. Offline evaluations of 800 recommender variants using 18 metrics were compared to the online evaluation of 12 selected variants. A key finding was that recommender performance highly depended on the number of items a user visited - ranking-based metrics best estimated performance for users with few visits, while click-through and visit rates became less consistent predictors for users with more history. The regression models trained to predict online results from offline metrics preferred cosine- and word embedding-based recommenders. Future work could evaluate metrics incorporating time or business outcomes like conversions.