Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cybersecurity with AI - Ashrith Barthur


Published on

We present solutions on how to make the cyberspace secure through feature-rich, robust, yet lean machine learning-based algorithms that help organizations identify malicious actors, intruders and illegal system access by studying features that arise purely from system login behavior.

- Powered by the open source machine learning software Contributors welcome at:
- To view videos on H2O open source machine learning software, go to:

Published in: Data & Analytics
  • Be the first to comment

Cybersecurity with AI - Ashrith Barthur

  1. 1. CONFIDENTIAL Ashrith Barthur, Security Scientist July 19, 2016 CyberSecurity and AI - Looking for anomalies
  2. 2. Few Problems in Cybersecurity 1. Malicious external/internal threat (Phishing, Malicious Domains, etc.) 2. Large scale attacks (DDoS, Spam campaign, etc.) 3. Data loss (Data Ex-filtration) 4. User behavioural analytics (Inside threat, account take over) These are primary problems enterprises are interested in solving as it directly affects business.
  3. 3. How are these cybersecurity problems handled? 1. Rule Based systems 2. Large scale user of experts who understand systems well 3. Expert identification of conditions and their combinations which are true markers of malicious behaviour 4. Multiple security professionals who understand specific conditions and combination, and can identify malicious behaviour
  4. 4. Is this justified? YES. Why? 1. Cyber Security's focus is to identify every instance of malicious behaviour and not leave things to probability. 2. Risk associated with each security event is large. Thus, making identification of each event very important.
  5. 5. What is the problem with this approach? 1. It takes time as large amount of logs need to be analysed and threats must be identified as real/potential/false positive. 2. Requires experts, large number of professionals. 3. It is a manual process and requires investigation with associated events, multiple logs - considerably slow. 4. Even with a thorough investigation it is possible that a malicious event could be missed - anomalous.
  6. 6. Outlier? Anomalous? 1. Outliers are simply put events (when statistically modeled) have a low probability of occurrence. 2. Anomalies are events that have never been seen. 3. Identifying anomalous events is difficult.
  7. 7. How do you solve this problem? 1. Create a malicious behaviour context based on your domain knowledge 2. Using the context to statistically transform the anomalous behaviour as an outlier or at least as a unique occurrence. 3. See if the model fits your contextual assumptions.
  8. 8. Example 1. Studying successful Windows user login times for the entire enterprise does not yield interesting behaviour. 2. Studying these user logins in context is important. 3. Understanding that login patterns of general users, administrators and system account accounts are different. 4. Also, understanding that different kinds of logins, physical systems logins, network based, remote, unlocks, caches logins are different in behaviour. 5. Interactions between types of users and types of logins also yield unique behaviour. Each analytical context is associated with a certain expected behaviour. Any violation of this expected behaviour is flagged and studied.
  9. 9. The Problem? Even Now? 1. The biggest problem even now is that there is no ground truth for us to identify that a behaviour identified as unexpected, outside its context is truly anomalous. 2. Therefore we end up with the problem of unsupervised process 3. Anomalous behaviour detection in cyber security is unsupervised Only Data tells us the truth. We validate our analysis using feedback.
  10. 10. How do we solve this? 1. We still have experts who can identify if these identified behaviours are indeed malicious 2. The information we provide speeds up the analytics and investigation 3. The building of context and statistically identifying unexpected behaviour reduces the need to go through unnecessary data. 4. We use this feedback at multiple levels, a. improve features that go into the context b. modify context itself c. look at changes in thresholds d. use the feedback as a mechanism to turn the problem into a supervised problem.
  11. 11. Event Correlation and Behavioural Identification - A perfect segway to log correlation.
  12. 12. 1. The idea of context is used where malicious behavioural identification is important. 2. Individual logs - system, network logs are not comprehensive enough to identify anomalous events on their own. 3. Therefore using log correlation to identify events and building a context around the event is important. 4. Individual events can never be considered in vacuum. 5. The logs primarily correlated by time and then by possibly connected events.
  13. 13. Example of Event/Log Correlation - An example of an event A user account with multiple failed logins, followed by a successful login. The successfully logged in machine connected to a database servers, requested a database dumb and this data was downloaded back to the machine. Identifying these events, and identifying that these events are happening in a series is is correlated events.
  14. 14. Let's break these events down. You have, 1. Multiple login attempts and 1 final successful login ( could be interpreted as a user trying his password wrongly - we all do that) 2. A connection to a database server (totally harmless) 3. A dump of the data on the machine (might be creating a new database and took a dump) 4. Moved the dump of data to the local machine (Totally fine if someone wants to work on the data locally)
  15. 15. The Analysis of correlated events 1. Here we have 4 different events which tell us a story only when there is correlation. 2. Correlation is important because behavioural anomalies described earlier are not statistical outliers. They are unseen data points. 3. These anomalies surface after observing the interactions between different events.
  16. 16. What have we gathered? 1. Defining the right context to identify anomalous malicious events. 2. Identification of correlated events for logs 3. Transformation of anomalous behaviour. 4. Verifying with experts
  17. 17. Thanks to the attendees, support staff, open source members of H2O, colleagues, and our clients for helping us help them by analysing new datasets and grow H2O.
  18. 18. The Team Mark Chan - Scientist, Engineer, Hacker, Ninja. Ivy Wang - UI, Problem, Details, Details, and Details Expert. Fonda Ingram - Comms, and Reqs Expert, The Wall (GoT).