Video available at: https://www.youtube.com/watch?v=z4JTjUp3NC0
To scale the building of decision trees on large amounts of Indeed job search data, we created a system called Imhotep. In addition to being a crucial tool for building these machine learning models, Imhotep has proven to be applicable to many different analytics problems. The core of Imhotep is a distributed system that manages the parallel execution of queries across a set of time-sharded inverted indices.
This talk covers Imhotep’s primitive operations that allow us to build decision trees, drill into data, build graphs, and even execute sql-like queries in IQL (Imhotep Query Language). We will also discuss what makes Imhotep fast, highly available, and fault tolerant.