AI & SECURITY
Anurag Sahay
Deep Learning, COE, Nagarro
ALL THINGS START WITH
PHILOSOPHY
Philosophically Fundamental Questions
Can a machine act
intelligently? Can it
solve any problem that
a person would solve
by thinking?
01
Are human intelligence
and machine
intelligence the same?
Is the human
brain essentially a
computer?
02
Can a machine have
a mind, mental states,
and consciousness in
the same way that a
human being can? Can
it feel how things are?
03
AI
impact on
Humanity
SOME BUSINESS CONTEXT
AI is the next digital frontier
In 2016 Companies invested
In artificial Intelligence
Tech Giants Startups
AI Adopters - 20%
in multiple technology areas
AI Partial Adopters - 40%
skeptical about Business Cases and ROI
Laggards - 40%
contemplators
Areas where AI creates significant Value
Smarter R&D
and
Forecasting
1
Optimized
Production and
Maintenance
2
Targeted Sales
and Marketing
3
Enhanced User
Experience
4
AI – THE BIG PICTURE
DEEP
LEARNING
AI – THE SCIENCE
MACHINE
LEARNING
STATISTICAL
SCIENCES
Maths Maths
Algorithms
Maths
Algorithms
Neural Net
AI – THE ENGINEERING
MODEL
WORKFLOW
INTEGRATION
MODEL
BUILDING
DATA
PROCESSING
Data Outcome Process
AI - USE CASE
PRESCRIPTIVE
W INTELLIGENCE
PREIDICTIVE
INTELLIGENCE
DESCRIPTIVE
INTELLIGENCE
Structure Prediction Action
AI – ORGANIZATIONAL MATURITY
AI BASED
ACTION ACTIONS
INSIGHTS
ORIENTED
DATA
BELIEF
Infrastructure Competency Product
AI – VARIOUS APPROACHES
FULLY CLOUD
BASED
Cloud Data APIs,
Model APIs,
Workflow APIs
SEMI CLOUD
BASED
Cloud Data APIs
Custom Model
Custom
Workflow
NON CLOUD
Custom Data APIs
Custom Model
Custom Workflow
SUPERVISED
MACHINE LEARNING
Learning with Examples!
THE LEARNING SPECTRUM
Human Beings learn from experience
We abstract our learning into a rule based
model which we then encode
algorithmically to program a machine
What if we could build a system that could
learn from DATA
Learning from experience Instructions
A system that learns from DATA
LINEAR REGRESSION
A simple way to learn from Data
Finding the right line is the important
problem
We solve it using the Gradient Descent
$100K
$500K
POLYNOMIAL REGRESSION
Linear Regression is more powerful than it
obviously seems. We can fit a lot of
different types of curves..
BAYES
Another interesting approach to
predicting the unknown is through the use
of Bayes Algorithm
cheap
If an email contains the word ‘cheap’ what is the
Probability that it is SPAM ?
3/4
SPAM HAM
Spelling mistake
Missing title
DECISION TREE
Decision Trees are a very simple and
intuitive way to solve a lot of different types
of problems
LOGISTIC REGRESSION
Use ‘misclassification errors’ instead of
‘distance’ to make a line that separates
two sets of data points
NEURAL NETWORK
Multiple ’decision lines’ that can help
separate the data
SUPPORT VECTOR
MACHINE
Instead of ‘gradient Descent’ use ‘Linear
Optimization’ to find the line that best
separates the two data sets
UNSUPERVISED
MACHINE LEARNING
Examples not included!
CLUSTERING
Find ‘ Cluster Structure’ in the Data
OUTLIER DETECTION
Find ‘ outliers’ in the Data
SECURITY AND AI
MALICIOUS URLS
Given a website ‘W’ and a list of
malicious/benign websites , identify
whether ‘W’ is malicious or not
Training Data = URLs of
benign/malicious sites
Use Whitelisted/Blacklisted Websites
URLs from well known sources
http://www.bfuduuioo1fp.mobi/ws/ebayisapi.dll
WHOIS registration 3/25/2008
Hosted from 208.76.89.91/22
IP hosted in Jaipur
Connection Speed T1
Has DNS PTR Record ? Yes
Registrant ”Anurag”
[ 0.56, 9.45, ……. 0 0 1 1 1 …. 1 0 .. 1 0 ]Feature
Building
Real
Valued
Host Based NLP Based
+
Feature Extraction from Website
Website DOM Structure
Advertising Categories
In/Out Links Type
Images on the Website
Now apply, one of the algorithms that we talked abo
and given a new website ‘W’ one can identify whethe
its malicious or not..
MALWARE DETECTION
Given a program code, identify whether
it is a malware/benign with some form
(virus, trojan, etc. )
How AV used to detect malware
Pattern matching on Static Files
Use techniques like emulation to decrypt code
How malwares evade AV
Polymorphic Malware metamorphoses to evade
signature identification
Why Machine Learning for Malware Detection
Too many malware/bots/trojan types
The malware code is very contextual
Mobiles, Networks, Devices..
Very processing heavy
AI oriented malware as well
Approach
Use multiple types of features to identify
Static and Behavioral
Exploit Context (Process, Runtime..)
Use layers of ML classifiers to increase confidence
Combine Supervised and Un-Supervised Learning
INTRUSION DETECTION
The presence of a ‘threat’ or a ‘risk’
within a system
Attacks can be ”Host based” or
“Network Based”
How we typically detect intrusions?
Look for signatures of known attacks, malicious activ
How threats evade IDS
Signature adaptation, new personalities etc.
Why Machine Learning for IDS
Can detect point intrusions, contextual intrusions and
even collective intrusions. This dramatically broadens
scope of different kinds of threat identification
Approach
Use Anomaly Detection which works on the idea
that the machine learns ‘what is normal’ and if there
is a deviation from the ‘normal’ an ‘attack’ is presume
‘What is Normal’ can be both spatial and temporal an
so quite sophisticated attacks can be easily thwarted
AI & SECURITY
AI enables better Security
Security enables better AI
Enforces Integrity
Enforces Privacy
Prevents Misuse
AI security

AI and Security

  • 1.
    AI & SECURITY AnuragSahay Deep Learning, COE, Nagarro
  • 2.
    ALL THINGS STARTWITH PHILOSOPHY
  • 3.
    Philosophically Fundamental Questions Cana machine act intelligently? Can it solve any problem that a person would solve by thinking? 01 Are human intelligence and machine intelligence the same? Is the human brain essentially a computer? 02 Can a machine have a mind, mental states, and consciousness in the same way that a human being can? Can it feel how things are? 03
  • 4.
  • 5.
  • 6.
    AI is thenext digital frontier In 2016 Companies invested In artificial Intelligence Tech Giants Startups AI Adopters - 20% in multiple technology areas AI Partial Adopters - 40% skeptical about Business Cases and ROI Laggards - 40% contemplators
  • 7.
    Areas where AIcreates significant Value Smarter R&D and Forecasting 1 Optimized Production and Maintenance 2 Targeted Sales and Marketing 3 Enhanced User Experience 4
  • 8.
    AI – THEBIG PICTURE
  • 9.
    DEEP LEARNING AI – THESCIENCE MACHINE LEARNING STATISTICAL SCIENCES Maths Maths Algorithms Maths Algorithms Neural Net
  • 10.
    AI – THEENGINEERING MODEL WORKFLOW INTEGRATION MODEL BUILDING DATA PROCESSING Data Outcome Process
  • 11.
    AI - USECASE PRESCRIPTIVE W INTELLIGENCE PREIDICTIVE INTELLIGENCE DESCRIPTIVE INTELLIGENCE Structure Prediction Action
  • 12.
    AI – ORGANIZATIONALMATURITY AI BASED ACTION ACTIONS INSIGHTS ORIENTED DATA BELIEF Infrastructure Competency Product
  • 13.
    AI – VARIOUSAPPROACHES FULLY CLOUD BASED Cloud Data APIs, Model APIs, Workflow APIs SEMI CLOUD BASED Cloud Data APIs Custom Model Custom Workflow NON CLOUD Custom Data APIs Custom Model Custom Workflow
  • 14.
  • 15.
    THE LEARNING SPECTRUM HumanBeings learn from experience We abstract our learning into a rule based model which we then encode algorithmically to program a machine What if we could build a system that could learn from DATA Learning from experience Instructions A system that learns from DATA
  • 16.
    LINEAR REGRESSION A simpleway to learn from Data Finding the right line is the important problem We solve it using the Gradient Descent $100K $500K
  • 17.
    POLYNOMIAL REGRESSION Linear Regressionis more powerful than it obviously seems. We can fit a lot of different types of curves..
  • 18.
    BAYES Another interesting approachto predicting the unknown is through the use of Bayes Algorithm cheap If an email contains the word ‘cheap’ what is the Probability that it is SPAM ? 3/4 SPAM HAM Spelling mistake Missing title
  • 19.
    DECISION TREE Decision Treesare a very simple and intuitive way to solve a lot of different types of problems
  • 20.
    LOGISTIC REGRESSION Use ‘misclassificationerrors’ instead of ‘distance’ to make a line that separates two sets of data points
  • 21.
    NEURAL NETWORK Multiple ’decisionlines’ that can help separate the data
  • 22.
    SUPPORT VECTOR MACHINE Instead of‘gradient Descent’ use ‘Linear Optimization’ to find the line that best separates the two data sets
  • 23.
  • 24.
    CLUSTERING Find ‘ ClusterStructure’ in the Data
  • 25.
    OUTLIER DETECTION Find ‘outliers’ in the Data
  • 26.
  • 27.
    MALICIOUS URLS Given awebsite ‘W’ and a list of malicious/benign websites , identify whether ‘W’ is malicious or not Training Data = URLs of benign/malicious sites Use Whitelisted/Blacklisted Websites URLs from well known sources http://www.bfuduuioo1fp.mobi/ws/ebayisapi.dll WHOIS registration 3/25/2008 Hosted from 208.76.89.91/22 IP hosted in Jaipur Connection Speed T1 Has DNS PTR Record ? Yes Registrant ”Anurag” [ 0.56, 9.45, ……. 0 0 1 1 1 …. 1 0 .. 1 0 ]Feature Building Real Valued Host Based NLP Based + Feature Extraction from Website Website DOM Structure Advertising Categories In/Out Links Type Images on the Website Now apply, one of the algorithms that we talked abo and given a new website ‘W’ one can identify whethe its malicious or not..
  • 28.
    MALWARE DETECTION Given aprogram code, identify whether it is a malware/benign with some form (virus, trojan, etc. ) How AV used to detect malware Pattern matching on Static Files Use techniques like emulation to decrypt code How malwares evade AV Polymorphic Malware metamorphoses to evade signature identification Why Machine Learning for Malware Detection Too many malware/bots/trojan types The malware code is very contextual Mobiles, Networks, Devices.. Very processing heavy AI oriented malware as well Approach Use multiple types of features to identify Static and Behavioral Exploit Context (Process, Runtime..) Use layers of ML classifiers to increase confidence Combine Supervised and Un-Supervised Learning
  • 29.
    INTRUSION DETECTION The presenceof a ‘threat’ or a ‘risk’ within a system Attacks can be ”Host based” or “Network Based” How we typically detect intrusions? Look for signatures of known attacks, malicious activ How threats evade IDS Signature adaptation, new personalities etc. Why Machine Learning for IDS Can detect point intrusions, contextual intrusions and even collective intrusions. This dramatically broadens scope of different kinds of threat identification Approach Use Anomaly Detection which works on the idea that the machine learns ‘what is normal’ and if there is a deviation from the ‘normal’ an ‘attack’ is presume ‘What is Normal’ can be both spatial and temporal an so quite sophisticated attacks can be easily thwarted
  • 30.
    AI & SECURITY AIenables better Security Security enables better AI Enforces Integrity Enforces Privacy Prevents Misuse AI security

Editor's Notes

  • #12 http://www.rosebt.com/blog/predictive-descriptive-prescriptive-analytics