AI in Security
Subrat Kumar Panda
AI First Thought Leader,
Director of Engineering, AI and Data Sciences,
Capillary Technologies
Bangalore
Agenda
● AI and Industry 4.0
● Brief intro AI, ML, IoT
● Security Evolution (AI related)
● Era of Data
● AI use cases in security
● Building and deploying a Intelligent Security Product
Brief Introduction about me
● BTech ( 2002) , PhD (2009) – CSE, IIT Kharagpur
● Synopsys (EDA), IBM (CPU), NVIDIA (GPU), Taro (Full Stack Engineer), Capillary (Principal Architect - AI)
● Applying AI to Retail
● Co-Founded IDLI (for social good) with Prof. Amit Sethi (IIT Bombay), Jacob Minz (Synopsys) and Biswa
Gourav Singh (Capillary)
● https://www.facebook.com/groups/idliai/
● Linked In - https://www.linkedin.com/in/subratpanda/
● Facebook - https://www.facebook.com/subratpanda
● Twitter - @subratpanda
Industry 4.0
https://en.wikipedia.org/wiki/Industry_4.0
1. Interoperability
2. Information
transparency
3. Technical assistance
4. Decentralized
decisions
Knowledge is Power - Sir Francis Bacon
- Industry 4.0 enabled by IoT, BigData and AI
- IoT is the intelligent sensor
- BigData will enable processing huge volumes of data
- AI will make sense of the data in decision making
- AI helps transform raw data into power - AI will transform businesses for sure
- Primarily Machine Learning and then the deeper aspects with Deep Learning
AI is the bedrock on which Industry 4.0 relies on.
The AI landscape - Nvidia
Machine Learning – http://techleer.com
What AI can and cannot Do today ?
https://hbr.org/2016/11/what-artificial-intelligence-can-and-cant-do-right-now
Supervised Learning
1. Being able to input A and output B will transform many industries.
2. The technical term for building this A→B software is supervised learning.
3. The best solutions today are built with a technology called deep learning or deep neural
networks, which were loosely inspired by the brain.
4. Basically labelled data is the most important requirement for Supervised Learning.
If a typical person can do a mental task with less than one second of thought, we can probably automate it
using AI either now or in the near future. - Andrew Ng
Transfer Learning - http://ruder.io/transfer-learning/
Transfer Learning - http://ruder.io/transfer-learning/
Drivers of ML Success
Machine Learning Tasks
● Regression (or prediction) — a task of predicting the next value based on the previous values.
● Classification — a task of separating things into different categories.
● Clustering — similar to classification but the classes are unknown, grouping things by their
similarity.
● Association rule learning (or recommendation) — a task of recommending something based on
the previous experience.
● Dimensionality reduction — or generalization, a task of searching common and most important
features in multiple examples.
● Generative models — a task of creating something based on the previous knowledge of the
distribution.
AI Funding in Cybersecurity
https://www.ciab.com/resources/artificial-intell
igence-cybersecurity/
Trends to Watch
https://www.ciab.com/resources/artificial-intell
igence-cybersecurity/
Future of AI
https://threatpost.com/artificial-intelligence-a-
cybersecurity-tool-for-good-and-sometimes-b
ad/137831/
AI powered Information Security
https://blog.capterra.com/artificial-
intelligence-in-cybersecurity/
Awesome ML Papers and Code for Cyber Security
- https://github.com/jivoi/awesome-ml-for-cybersecurity
- Datasets
- Papers
- Books
- Talks
- Tutorials
- Courses
ML
Applications
https://ccdcoe.org/uploads/2018/10/Art-19-On-the-Effectiveness-of-Machine-and-Deep-Learning-for-Cyber-Security.pdf
Malware Detection
https://arxiv.org/pdf/1904.02441.pdf
Malware Detection Methodology
- Problem Formulation - Binary Classification Problem
- Dataset
- Feature Extraction
- Dimensionality Reduction
- Model Building and Analysis
Datasets
- Malicia Project data
- Difference between the number of malware (11, 308) and benign executables (2, 819)
- Oversampling, Undersampling, Cluster based sampling helps
- Generalizability achieved by K-fold Cross Validation
Feature Extraction
- Decoding the executables
- Literature shows that various static attribute such as Windows API calls, strings, opcode, and
control flow graph are good feature vectors
- They used opcode frequency as a discriminatory feature
- Dimensionality Reduction
- Variance Threshold
- Autoencoders
Building the Learning Model
- Exploration/Ensemble of multiple models
- Random Forest
- DNN-2L
- DNN-4L
- DNN-7L
Results
- Achieved the highest accuracy of
99.78% with random forest and
variance threshold which is an
improvement of 1.26% on
previously reported the best
accuracy.
- In feature reduction, variance
threshold outplayed auto-encoders
in improving the model
performance.
- The best result did not come from
any of the deep learning models.
- DL was a overkill for Malicia
Dataset
Hardware Based Malware detector
https://cse.iitk.ac.in/users/spramod/papers/date17.pdf
Feature Sets
https://cse.iitk.ac.in/users/spramod/papers/date17.pdf
Reinforcement Learning
DQN architecture
Questions?

AI in security

  • 1.
    AI in Security SubratKumar Panda AI First Thought Leader, Director of Engineering, AI and Data Sciences, Capillary Technologies Bangalore
  • 2.
    Agenda ● AI andIndustry 4.0 ● Brief intro AI, ML, IoT ● Security Evolution (AI related) ● Era of Data ● AI use cases in security ● Building and deploying a Intelligent Security Product
  • 3.
    Brief Introduction aboutme ● BTech ( 2002) , PhD (2009) – CSE, IIT Kharagpur ● Synopsys (EDA), IBM (CPU), NVIDIA (GPU), Taro (Full Stack Engineer), Capillary (Principal Architect - AI) ● Applying AI to Retail ● Co-Founded IDLI (for social good) with Prof. Amit Sethi (IIT Bombay), Jacob Minz (Synopsys) and Biswa Gourav Singh (Capillary) ● https://www.facebook.com/groups/idliai/ ● Linked In - https://www.linkedin.com/in/subratpanda/ ● Facebook - https://www.facebook.com/subratpanda ● Twitter - @subratpanda
  • 4.
    Industry 4.0 https://en.wikipedia.org/wiki/Industry_4.0 1. Interoperability 2.Information transparency 3. Technical assistance 4. Decentralized decisions
  • 5.
    Knowledge is Power- Sir Francis Bacon - Industry 4.0 enabled by IoT, BigData and AI - IoT is the intelligent sensor - BigData will enable processing huge volumes of data - AI will make sense of the data in decision making - AI helps transform raw data into power - AI will transform businesses for sure - Primarily Machine Learning and then the deeper aspects with Deep Learning AI is the bedrock on which Industry 4.0 relies on.
  • 6.
  • 7.
    Machine Learning –http://techleer.com
  • 8.
    What AI canand cannot Do today ? https://hbr.org/2016/11/what-artificial-intelligence-can-and-cant-do-right-now
  • 9.
    Supervised Learning 1. Beingable to input A and output B will transform many industries. 2. The technical term for building this A→B software is supervised learning. 3. The best solutions today are built with a technology called deep learning or deep neural networks, which were loosely inspired by the brain. 4. Basically labelled data is the most important requirement for Supervised Learning. If a typical person can do a mental task with less than one second of thought, we can probably automate it using AI either now or in the near future. - Andrew Ng
  • 11.
    Transfer Learning -http://ruder.io/transfer-learning/
  • 12.
    Transfer Learning -http://ruder.io/transfer-learning/
  • 13.
  • 14.
    Machine Learning Tasks ●Regression (or prediction) — a task of predicting the next value based on the previous values. ● Classification — a task of separating things into different categories. ● Clustering — similar to classification but the classes are unknown, grouping things by their similarity. ● Association rule learning (or recommendation) — a task of recommending something based on the previous experience. ● Dimensionality reduction — or generalization, a task of searching common and most important features in multiple examples. ● Generative models — a task of creating something based on the previous knowledge of the distribution.
  • 15.
    AI Funding inCybersecurity https://www.ciab.com/resources/artificial-intell igence-cybersecurity/
  • 16.
  • 17.
  • 18.
    AI powered InformationSecurity https://blog.capterra.com/artificial- intelligence-in-cybersecurity/
  • 19.
    Awesome ML Papersand Code for Cyber Security - https://github.com/jivoi/awesome-ml-for-cybersecurity - Datasets - Papers - Books - Talks - Tutorials - Courses
  • 20.
  • 21.
  • 22.
  • 23.
    Malware Detection Methodology -Problem Formulation - Binary Classification Problem - Dataset - Feature Extraction - Dimensionality Reduction - Model Building and Analysis
  • 24.
    Datasets - Malicia Projectdata - Difference between the number of malware (11, 308) and benign executables (2, 819) - Oversampling, Undersampling, Cluster based sampling helps - Generalizability achieved by K-fold Cross Validation
  • 25.
    Feature Extraction - Decodingthe executables - Literature shows that various static attribute such as Windows API calls, strings, opcode, and control flow graph are good feature vectors - They used opcode frequency as a discriminatory feature - Dimensionality Reduction - Variance Threshold - Autoencoders
  • 26.
    Building the LearningModel - Exploration/Ensemble of multiple models - Random Forest - DNN-2L - DNN-4L - DNN-7L
  • 27.
    Results - Achieved thehighest accuracy of 99.78% with random forest and variance threshold which is an improvement of 1.26% on previously reported the best accuracy. - In feature reduction, variance threshold outplayed auto-encoders in improving the model performance. - The best result did not come from any of the deep learning models. - DL was a overkill for Malicia Dataset
  • 28.
    Hardware Based Malwaredetector https://cse.iitk.ac.in/users/spramod/papers/date17.pdf
  • 29.
  • 30.
  • 31.
  • 32.