Security monitoring and threat response has diverse processing demands on large volumes of log and telemetry data. Processing requirements span from low-latency stream processing to interactive queries over months of data. To make things more challenging, we must keep the data accessible for a retention window measured in years. Having tackled this problem before in a massive-scale environment using Apache Spark, when it came time to do it again, there were a few things I knew worked and a few wrongs I wanted to right.
We approached Databricks with a set of challenges to collaborate on: provide a stable and optimized platform for Unified Analytics that allows our team to focus on value delivery using streaming, SQL, graph, and ML; leverage decoupled storage and compute while delivering high performance over a broad set of workloads; use S3 notifications instead of list operations; remove Hive Metastore from the write path; and approach indexed response times for our more common search cases, without hard-to-scale index maintenance, over our entire retention window. This is about the fruit of that collaboration.