Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Needle in the Haystack—User Behavior Anomaly Detection for Information Security with Ping Yan and Wei Deng


Published on

Salesforce recently invented and deployed a real-time, scalable, terabyte data-level and low false positive personalized anomaly detection system. Anomaly detection on user in-app behavior at terabyte-data scale is extremely challenging because traditional techniques like clustering methods suffer serious production performance issues.

Salesforce’s method tackles the traditional challenges through three phases: 1) Leveraging Principal Component Analysis (PCA) to extract high-variance and low-variance feature subsets. The low-variance feature subset is valuable in cybersecurity because we want to determine if a user deviates from his or her stable behavior. The high-variance one is used for dimension reduction; 2) On each feature subset, they build a profile for each user to characterize the user’s baseline behavior and legitimate abnormal behavior; 3) During detection, for each incoming event, their method will compare it with the user’s profile and produce an anomaly score. The computation complexity of the detection module for each incoming event is constant.
st cloud computing platforms; the novelty of our user behavior profiling based anomaly detection technique and the challenges of implementing and deploying it with Apache Spark in production. We will also demonstrate how our system outperforms the other traditional machine learning algorithms.

Published in: Data & Analytics

Needle in the Haystack—User Behavior Anomaly Detection for Information Security with Ping Yan and Wei Deng

  1. 1. User Behavior Anomaly Detection for Platform Security Wei Deng, Ping Yan Needle in the Haystack
  2. 2. Information Security is Hard • Signatures & Rules Known knowns • Correlation Analysis Known unknowns • Anomaly Detection Unknown unknowns 2
  3. 3. User In-app Behavior WHO WHEN WHERE HOW WHAT Actions logins UI page views API calls password reset ... Entities client IP timestamps user agents ... Derived hour of day day of week geo country ... 3
  4. 4. Challenges • Size of data, speed to response • Variability of threats • Little to none ground truth • Feature selection • User behavior novelty 4
  5. 5. A Behavior Anomaly Detection Approach 5
  6. 6. Profile Store Anomaly Detection Engine Event PCA Matrix Selected Features Anomaly Detector Anomaly Profile Building Parsed Event User profiles Stage I – Profile Building Stage II – Detection Event Store PCA 6
  7. 7. PCA • High variance variables Representative of original data set • Low variance variables Representative of users’ stable behavior 7
  8. 8. Profile Building 3 statistics to summarize each user’s historical distribution • User’s behavior baseline • Variance of user’s behavior • Legitimate non-typical behavior 8
  9. 9. Profile Building 9
  10. 10. Detection • Deviation from user’s baseline • Re-scale deviation score with a correction factor 10
  11. 11. Detection 11
  12. 12. Evaluation with Synthetic Data 12
  13. 13. Deployment 13
  14. 14. Thank You. Wei Deng Ping Yan @pingpingya