Salesforce recently invented and deployed a real-time, scalable, terabyte data-level and low false positive personalized anomaly detection system. Anomaly detection on user in-app behavior at terabyte-data scale is extremely challenging because traditional techniques like clustering methods suffer serious production performance issues.
Salesforce’s method tackles the traditional challenges through three phases: 1) Leveraging Principal Component Analysis (PCA) to extract high-variance and low-variance feature subsets. The low-variance feature subset is valuable in cybersecurity because we want to determine if a user deviates from his or her stable behavior. The high-variance one is used for dimension reduction; 2) On each feature subset, they build a profile for each user to characterize the user’s baseline behavior and legitimate abnormal behavior; 3) During detection, for each incoming event, their method will compare it with the user’s profile and produce an anomaly score. The computation complexity of the detection module for each incoming event is constant.
st cloud computing platforms; the novelty of our user behavior profiling based anomaly detection technique and the challenges of implementing and deploying it with Apache Spark in production. We will also demonstrate how our system outperforms the other traditional machine learning algorithms.