J.C. Freytag & Fabian Fier, Humboldt University Berlin, presented at the 2016 HPCC Systems Engineering Summit Community Day. Finding similar objects in large datasets is an important database operation. The operation is used in applications like plagiarism detection, document clustering, or duplicate removal. With increasing dataset sizes, this problem cannot be solved canonically anymore. With straightforward approaches, the runtime becomes very large even for moderately large datasets and despite using distributed systems. We give an overview on finding similarity on text data and discuss scalable solution approaches. In our research, we experimentally compare algorithmic approaches in order to optimize the runtime. We show that is a complex problem due to many involved parameters such as data properties like skew, runtime parameters, or implementation details. We give insights to our practical findings when comparing implementations on Hadoop with implementations on HPCC Systems. This talk aims at practitioners as well as theoreticians who are interested in similarity search, text processing, and scalable algorithmic approaches that are inspired by MapReduce and are adaptable to HPCC Systems. J.C. Freytag Johann-Christoph Freytag is currently full professor for Databases and Information Systems (DBIS) at the Computer Science Department of the Humboldt-Universität zu Berlin, Germany. Before joining the department in 1994, he was a research staff member at the IBM Almaden Research Center (1985-1987), a researcher at the European Computer-Industry-Research Centre (ECRC, in Munich, Germany, 1987-1989), and the head of Digital's Database Technology Center (also in Munich, 1990-1993). He holds a Ph.D. in Applied Mathematics/Computer Science from Harvard University, MA. Prof. Freytag's research interests include all aspects of query processing and query optimization in object-relational database systems, new developments in the database area (such as semi-structured data, data quality), privacy in database systems, and applying database technology to applications such as GIS, genomics, and bioinformatics/life science. In the last years he received the IBM Faculty Award four times for collaborative work in the areas of databases, middleware, and bioinformatics/life science. Fabian Fier Fabian Fier is PhD student at the research group of Prof. Johann-Christoph Freytag. He holds a diploma in computer science from Humboldt-Universität zu Berlin. His research interest is similarity search on web-scale data. He uses techniques from textual similarity joins on Big Data and adapts them to similarity search.