Hydra is an open source technology that brings structure to unstructured data like news articles and text documents. It enriches documents with metadata through language detection, sentiment analysis, and other techniques. This allows the data to be better searched and filtered. Hydra is designed to scale to large amounts of data through a fault tolerant and robust pipeline architecture. It can integrate with Hadoop for processing entire document sets at once for applications like pagerank and analytics.