Wait! Exclusive 60 day trial to the world's largest digital library.
The SlideShare family just got bigger. You now have unlimited* access to books, audiobooks, magazines, and more from Scribd.Cancel anytime.
Security monitoring and threat response has diverse processing demands on large volumes of log and telemetry data. Processing requirements span from low-latency stream processing to interactive queries over months of data. To make things more challenging, we must keep the data accessible for a retention window measured in years. Having tackled this problem before in a massive-scale environment using Apache Spark, when it came time to do it again, there were a few things I knew worked and a few wrongs I wanted to right.
We approached Databricks with a set of challenges to collaborate on: provide a stable and optimized platform for Unified Analytics that allows our team to focus on value delivery using streaming, SQL, graph, and ML; leverage decoupled storage and compute while delivering high performance over a broad set of workloads; use S3 notifications instead of list operations; remove Hive Metastore from the write path; and approach indexed response times for our more common search cases, without hard-to-scale index maintenance, over our entire retention window. This is about the fruit of that collaboration.