This document discusses the findilike system, which ranks hotels and summarizes reviews based on user preferences. It describes two components that can be evaluated: opinion-based entity ranking and review summarization. For evaluation, it proposes interleaving results from different ranking algorithms and randomly mixing phrases from summarization algorithms to observe user clicks, with more clicks indicating a better algorithm. It also describes a mini test bed for local evaluation and submitting new algorithms for online performance testing.