Lighthouse is an open-source toolkit for building data lakes. It provides utilities for defining data sources, constructing data pipelines, and consistently reading and writing data to the data lake. It uses Spark and supports ORC file format by default. The toolkit aims to make it easy to identify, reuse, and test data sources and transformations.