Redpoll is a machine learning library based on Hadoop that was created to handle large-scale datasets. It uses decomposition techniques to distribute work across mappers and reducers. The library currently includes implementations for algorithms like Naive Bayes, K-means clustering, and canopy clustering. Future work may include support for SVM and other algorithms like EM, LSI, SVD and PageRank. Redpoll is open source and the developers welcome community involvement through contributing code or documentation.