This document discusses the author's experience competing in a Kaggle competition to predict salaries from job postings. It describes how the author initially just copied example code but failed to improve scores. Various machine learning approaches like clustering and random forests were attempted with limited success due to issues with data cleaning and model selection. Ultimately, the author achieved higher scores by using Lucene to index the data and perform clustering via its "MoreLikeThis" query.