Wait! Exclusive 60 day trial to the world's largest digital library.
The SlideShare family just got bigger. You now have unlimited* access to books, audiobooks, magazines, and more from Scribd.Cancel anytime.
The More Like This search functionality is a key feature in Apache Lucene that allows to find similar documents to an input one (text or document). Being widely used but rarely explored, this presentation will start introducing how the MLT works internally. The focus of the talk is to improve the general understanding of MLT and the way you could benefit from it. Building on the introduction the focus will be on the BM25 text similarity function and how this has been (tentatively) included in the MLT through a conspicious refactor and testing process, to improve the identification of the most interesting terms from the input that can drive the similarity search. The presentation will include real world usage examples, proposed patches, pending contributions and future developments such as improved query building through positional phrase queries.