Be the first to like this
The challenge with operating modern data centers is the figuring out how to collect, store, and search all of the log and event data generated by complex infrastructure. To address these challenges at Rocana we're applying big data technologies that are designed to handle this massive scale. In particular we need to simultaneously offer full-text and faceted search queries against large volumes of historical data as well as perform near-real time search against events collected in the last minute. In this talk we describe the challenges we've experienced performing search against petabytes of historical data while scaling to terabytes of new data ingest per day. We'll also share the key lessons we've learned and detail our architecture for handling search at these massive volumes. We detail how we use Apache Kafka to stream event data in near real-time to Apache Hadoop HDFS for deep storage and how we leverage Apache Solr and Apache Lucene to scale our search infrastructure to terabytes per day of ingest. We present our solution in the context of an overall event data management solution and show real-world scalability results.