This document describes BakeSearch, a recipe search tool that clusters recipes based on ingredients using natural language processing and machine learning techniques. It discusses challenges in clustering and analyzing large datasets of 40,000 recipes and 4,000 ingredients, and how tools like MapReduce, Amazon EMR, NumPy, Scipy, Nltk and Networkx are used to overcome these challenges at scale.