1. AI in Security
Subrat Kumar Panda
AI First Thought Leader,
Director of Engineering, AI and Data Sciences,
Capillary Technologies
Bangalore
2. Agenda
● AI and Industry 4.0
● Brief intro AI, ML, IoT
● Security Evolution (AI related)
● Era of Data
● AI use cases in security
● Building and deploying a Intelligent Security Product
3. Brief Introduction about me
● BTech ( 2002) , PhD (2009) – CSE, IIT Kharagpur
● Synopsys (EDA), IBM (CPU), NVIDIA (GPU), Taro (Full Stack Engineer), Capillary (Principal Architect - AI)
● Applying AI to Retail
● Co-Founded IDLI (for social good) with Prof. Amit Sethi (IIT Bombay), Jacob Minz (Synopsys) and Biswa
Gourav Singh (Capillary)
● https://www.facebook.com/groups/idliai/
● Linked In - https://www.linkedin.com/in/subratpanda/
● Facebook - https://www.facebook.com/subratpanda
● Twitter - @subratpanda
5. Knowledge is Power - Sir Francis Bacon
- Industry 4.0 enabled by IoT, BigData and AI
- IoT is the intelligent sensor
- BigData will enable processing huge volumes of data
- AI will make sense of the data in decision making
- AI helps transform raw data into power - AI will transform businesses for sure
- Primarily Machine Learning and then the deeper aspects with Deep Learning
AI is the bedrock on which Industry 4.0 relies on.
8. What AI can and cannot Do today ?
https://hbr.org/2016/11/what-artificial-intelligence-can-and-cant-do-right-now
9. Supervised Learning
1. Being able to input A and output B will transform many industries.
2. The technical term for building this A→B software is supervised learning.
3. The best solutions today are built with a technology called deep learning or deep neural
networks, which were loosely inspired by the brain.
4. Basically labelled data is the most important requirement for Supervised Learning.
If a typical person can do a mental task with less than one second of thought, we can probably automate it
using AI either now or in the near future. - Andrew Ng
14. Machine Learning Tasks
● Regression (or prediction) — a task of predicting the next value based on the previous values.
● Classification — a task of separating things into different categories.
● Clustering — similar to classification but the classes are unknown, grouping things by their
similarity.
● Association rule learning (or recommendation) — a task of recommending something based on
the previous experience.
● Dimensionality reduction — or generalization, a task of searching common and most important
features in multiple examples.
● Generative models — a task of creating something based on the previous knowledge of the
distribution.
15. AI Funding in Cybersecurity
https://www.ciab.com/resources/artificial-intell
igence-cybersecurity/
23. Malware Detection Methodology
- Problem Formulation - Binary Classification Problem
- Dataset
- Feature Extraction
- Dimensionality Reduction
- Model Building and Analysis
24. Datasets
- Malicia Project data
- Difference between the number of malware (11, 308) and benign executables (2, 819)
- Oversampling, Undersampling, Cluster based sampling helps
- Generalizability achieved by K-fold Cross Validation
25. Feature Extraction
- Decoding the executables
- Literature shows that various static attribute such as Windows API calls, strings, opcode, and
control flow graph are good feature vectors
- They used opcode frequency as a discriminatory feature
- Dimensionality Reduction
- Variance Threshold
- Autoencoders
26. Building the Learning Model
- Exploration/Ensemble of multiple models
- Random Forest
- DNN-2L
- DNN-4L
- DNN-7L
27. Results
- Achieved the highest accuracy of
99.78% with random forest and
variance threshold which is an
improvement of 1.26% on
previously reported the best
accuracy.
- In feature reduction, variance
threshold outplayed auto-encoders
in improving the model
performance.
- The best result did not come from
any of the deep learning models.
- DL was a overkill for Malicia
Dataset