Bixo is a web mining toolkit built on Apache Cascading and Hadoop that allows users to extract and analyze data from web pages through a customizable ETL (extract-transform-load) workflow. It provides input and output taps to ingest data from various sources and export results, as well as built-in Bixo Pipes for common web mining tasks like link analysis, page scoring, and entity extraction. Workflows can be run locally or deployed to Hadoop clusters on EC2.