1. The document discusses security and privacy issues related to machine learning, including adversarial examples across various domains like computer vision, natural language processing, and speech recognition.
2. It provides examples of adversarial attacks, such as altering images or text in imperceptible ways to cause AI systems to make mistakes in classification.
3. The document also covers potential defenses against adversarial examples and their limitations, highlighting that this remains an important open problem in machine learning security.
4. 4
Introduction
What can AI learn from security?
• History has shown attacker always follows footsteps of
new technology development.
• As AI controls more and more systems, attacker will
have higher & higher chances.
5. 5
Failures of ML
Microsoft’s AI chatbot
The bot was Artificial Intelligence
Chatterbot that created by Microsoft, and
named "Tay" after the acronym "thinking
about you".
There are similar platforms as Xiaoice in
China and Rinna in Japan.
• Tay was designed to mimic the language patterns of a 19-year-old
American girl, and to learn from interacting with human users of
Twitter.
• The system collected vast amounts of intimate details on
individuals, the program raises privacy questions.
6. 6
Failures of ML
Microsoft’s AI chatbot
Microsoft had to shut down “TAY” only
16 hours after its launch on March 23,
2016 because it started tweeting Racist
messages such as:
“Tay accused George W. Bush of causing 9/11, praise
Hitler, refered to President Barack Obama as a “monkey”
Tay's behavior on a "coordinated attack by a subset of
people" that "exploited a vulnerability in Tay.
7. 7
What is the adversarial examples?
In the paper “Explaining and harnessing Adversarial
Examples”, Goodfellow et al, International Conference on
Learning Representations 2015 :
“Adversarial examples is that ML models misclassify examples
that are only slightly different from correctly classified examples
draw from the data distribution”.
9. 9
Adversarial Examples is a big problem?
https://nicholas.carlini.com/writing/2019
/all-adversarial-example-papers.html
NeurIPS - Workshop on Security in Machine
Learning 2019
ICML - Workshop on the Security and Privacy
of Machine Learning 2019
ICLR - Safe Machine Learning Specification,
Robustness and Assurance Workshop 2019
CVPR - The Bright and Dark Sides of Computer
Vision: Challenges and Opportunities for
Privacy and Security Workshop 2019
https://github.com/IBM/adversar
ial-robustness-toolbox https://github.com/tensorflow/cleverhans
10. 10
Adversarial Examples
There are two types of adversarial examples attacks:
1. White-box attacks
2. Black-box attacks
Or we can categorize AE as output type:
1. Untargeted attack
2. Targeted attack
12. 12
Interpreting a linear classifier
Score function that maps the raw image pixels
to class scores
Higher score means better, or ('more likely')
Loss function: measures quality of outcomes
the loss will be high if we’re doing a poor classification, and it
will be low if we’re doing well.
http://cs231n.github.io
13. 13
72 136 56 103
61 52 134 42
142 210 134 99
150 46 81 76
72 136 56 201
61 52 134 172
142 210 134 81
48 35 110 200
How AI system recognizes a RGB Image?
Input RGB Image
“vending machine”
Red channel
Green channel
Blue channel
72 136 56 80
61 52 134 97
142 210 134 178
36 79 63 85
What a computer sees
Three matrices correspond to three channels
3つの行列が3つのチャンネルに対応
15. 15
How AI system recognizes a RGB Image?
72
136
56
…
…
…
…
…
201
…
…
…
103
…
…
Vector Input Layer
Hidden Layer
Output Layer
thatch
keyboard
…
16. 16
AI Model
Let’s consider the Google Inception V3 that was
trained on ImageNet dataset with 1,000 classes. Input
image is a color image with 299x299 size.
Input RGB Image 299 x 299
72 136 56 103
61 52 134 42
142 210 134 99
150 46 81 76
72 136 56 201
61 52 134 172
142 210 134 81
48 35 110 200
72 136 56 80
61 52 134 97
142 210 134 178
36 79 63 85
299 x 299 x3
1x268,203
scaling
Vectorization
72
136
56
80
…
…
201
172
81
…
…
103
42
99
…
17. 17
Normal process
In a normal process to train a neural network:
ニューラルネットワークを訓練する通常のプロセ
スでは
Fix, no change
固定、変更なし
Fine-tune W respect to Loss(input, label)
Try to find the best W that makes Loss function is as small as possible
損失に対するWの微調整(入力、ラベル)
損失関数ができるだけ小さくなるような最良のWを見つけようとする
label
http://cs231n.github.io
18. 18
Crafting Adversarial Examples
In a process to create adversarial examples:
敵対的な例を作成する過程で
Fine-tune input respect to Loss(input, label)
Keep Loss(input, label) is large
Keep Loss(adversarial, new label) is small
label
Fix, no change
固定、変更なし
損失に対する入力の微調整(入力、ラベル)
(入力、ラベル)の損失は大きく、
(敵対的、新しいラベル)の損失は小さくする
http://cs231n.github.io
21. 21
Adversarial Examples is actually dangerous?
Paper: Fooling automated surveillance cameras:
adversarial patches to attack person detection, CVPR
2018
22. 22
Adversarial Examples is actually dangerous?
Autonomous car are built on ML/AI for auto detect traffic
sign?
What happen if that car’s AI system does misrecognition
“STOP” traffic sign instead of “turn right” sign when it is
running in a high way?
Taesoo Kim, AI & Security, Microsoft Research Faculty Summit 2017
23. 23
However, white-box attack is not really realistic
because you’re unlikely to get access to the
gradients of the loss function on any particular
system.
-> The paper from Papernot et al, “Practical Black-
Box Attacks against Machine Learning”, ASIA CCS
2017.
Adversarial Samples –
Black-box attacks
24. 24
In a more realistic context, you would want to
attack a system having only access to its
outputs.
In the paper “Practical Black-Box Attacks
against Machine Learning” proposed a
substitute Deep Neural Networks to solve the
same classification tasks as the target model.
Adversarial Samples –
Black-box attacks
25. 25
There are two main strategies in black-box attacks:
1. Create a synthetic dataset.
2. Build and Train a local substitute Deep Neural
Networks.
Adversarial Samples –
Black-box attacks
26. 26
How to create a synthetic dataset (input, output)?
1. Inputs: are synthetic and generated by the
adversary.
2. Outputs: are labels assigned by the target DNN
and observed by the adversary.
Adversarial Samples –
Black-box attacks
27. 27
How to build a local substitute Deep Neural Networks?
The attacker queries the oracle (target system) with
synthetic inputs selected by a Jacobian-based
heuristic to build a model F approximating the
oracle model O’s decision boundaries.
Adversarial Samples –
Black-box attacks
28. 28
What are the Oracle DNNs?:
1. Amazon oracle: https://aws.amazon.com/machine-
learning
2. Google oracle: https://cloud.google.com/prediction
3. MetaMind oracle: https://einstein.ai/
They provide some functionalities: dataset upload,
automated model training and model prediction
querying.
This method made MetaMind misclassify at 84.24%,
Amazon and Google at 96.19% and 88.94%.
Adversarial Samples –
Black-box attacks
29. 29
In an another real scenario, the attacker would not be
allowed to provide its own image files, the neural
network would take camera pictures as input. That’s
the problem, the authors of this paper “Adversarial
examples in the physical world”(ICLR, 2017) are
trying to solve.
Adversarial Samples –
Black-box attacks
30. 30
“We used images taken from a cell-phone camera as a input
to an image classification neural network”
Adversarial Samples –
Black-box attacks
32. 32
Adversarial Examples on NLP
Paper: Deep Text Classification Can be Fooled, Bin Liang et al,
International Joint Conference on Artificial Intelligence, 2018.
The foundation of attack lies in identifying the text items that
possess significant contribution to the classification by leveraging
the cost gradient.
They determine the most frequent phrases, called Hot Training
Phrases (HTPs).
33. 33
Adversarial Examples on NLP
Paper: Deep Text Classification Can be Fooled, Bin Liang et al,
International Joint Conference on Artificial Intelligence, 2018.
The character-level DNN is trained on a DBpedia ontology dataset,
which contains 560,000 training samples and 70,000 testing
samples of 14 high-level classes, such as Company, Building, Film
and so on.
34. 34
Adversarial Examples on NLP
Paper: Deep Text Classification Can be Fooled, Bin Liang et al,
International Joint Conference on Artificial Intelligence, 2018.
The MR dataset is a movie review repository (containing 10,662
reviews) while CR contains 3,775 reviews about products, e.g. a
music player. Reviews from both datasets can be categorized as
either Positive or Negative.
35. 35
Related papers
Generating Natural Language Adversarial Examples, Moustafa et
al,. EMNLP 2018.
TEXTBUGGER: Generating Adversarial Text Against Real-world
Applications, NDSS 2019
37. 37
Introduction
Personal assistants such as Alexa, Siri, or Cortana are widely deployed
these days. Such Automatic Speech Recognition (ASR) systems can
translate and even recognize spoken language and provide a written
transcript of the spoken language. Recent advances in the fields of
deep learning and big data analysis supported significant progress for
ASR systems and have become almost as good at it as human
listeners.
What is Automatic Speech Recognition (ASR) system?
38. 38
Automatic Speech Recognition (ASR)
Paper: Adversarial Attacks Against Automatic Speech
Recognition Systems via Psychoacoustic Hiding, Lea et al.,
NDSS 2019
39. 39
How can we attack ASR systems?
Psychoacoustics: Human hearing is limited to a certain range
of frequencies, amplitudes, and signal dynamics. The field
of psychoacoustics addresses such hearing restrictions and
provides a rule set for things that humans can and cannot
hear. While these rules are used in different fields, e.g.,
in MP3 music compression, we can also utilize
psychoacoustics for our attack to hide noise in such a way
that humans (almost) cannot hear it.
40. 40
Psychoacoustics
Hearing Models
Psychoacoustic hearing thresholds describe masking effects
in human acoustic perception. Probably the best-known
example for this is MP3 compression, where the
compression algorithm uses a set of computed hearing
thresholds to find out which parts of the input signal are
imperceptible for human listeners. By removing those
parts, the audio signal can be transformed into a smaller
but lossy representation, which requires less disk space and
less bandwidth to store or transmit.
41. 41
Attack
For the attack, in principle, we use the very the same algorithm as for
the training of neural networks. The algorithm is based on gradient
descent, but instead of updating the parameters of the neural
network as for training, the audio signal is modified. We use the
hearing thresholds to avoid changes in easily perceptible parts of the
audio signal.
https://adversarial-attacks.net/
Kaldi Speech Recognition Toolkit
https://github.com/kaldi-asr/kaldi
43. 43
Attack Model
Forced alignment refers to the process by which
orthographic transcriptions are aligned to audio
recordings to automatically generate phone level
segmentation.
The main difference between “original audio” and
“raw audio” is that the original audio does not
change during the run-time of the algorithm, but
the raw audio is updated iteratively in order to
result in an adversarial example.
44. 44
Results
Original Audio:
THE WALLOP PROPOSAL WOULD COST FIVE POINT FOUR TWO
BILLION DOLLARS OVER FIVE YEARS.
Adversarial Audio:
I BELIEVE ALL PEOPLE ARE GOOD
Noise:
45. 45
Related papers
Imperceptible, Robust, and Targeted Adversarial
Examples for Automatic Speech Recognition, Yao
Qin et al., ICML 2019.
Robust Audio Adversarial Example for a Physical
Attack, Hiromu Yakura and Jun Sakuma, IJCAI 2019
46. 46
Back to AE in Computer Vision
There is no way to protect AI model from
Adversarial Examples?
47. 47
Back to AE in Computer Vision
Good news from CVPR 2017….
Paper: NO Need to Worry about Adversarial Examples in
Object Detection in Autonomous Vehicles, Jiajun Lu et al.,
CVPR 2017.
48. 48
Back to AE in Computer Vision
But ….
Paper: Obfuscated Gradients Give a False Sense of Security:
Circumventing Defenses to Adversarial Examples, Athalye
et al., ICML 2018 (Best Paper).
49. 49
Demonstration
• I will make demonstration of adversarial
examples on image classification tasks
by using pytorch and tensorflow with
several state-of-the-art AI models
(Inception V3-4, Resnet-152 …)
50. 50
Future work
In this talk I cover about adversarial attacks on image
classification, natural language processing and speech
recognition.
Adversarial examples are raising as a very crucial problem in
deep learning.
Defense strategies are not included in this talk, hope I have
another chance to discuss about it.
Q & A
Artificial intelligence and machine learning are being applied more broadly across different industries and applications than ever before, and cyber security is no exception. In a cybersecurity context, AI is software that perceives its environment well enough to identify events and take action against a predefined purpose. AI is particularly good at recognizing patterns and anomalies within them, which makes it an excellent tool to detect threats. Machine learning algorithms can be used to create profiles of normal behavior, and these profiles can be either more global, or alternatively either user or host based. Based on these, it is possible to differentiate normal and abnormal behavior practically in real time.
For example, in the image below an image classification model takes a single image and assigns probabilities to 4 labels, {cat, dog, hat, mug}. As shown in the image, keep in mind that to a computer an image is represented as one large 3-dimensional array of numbers. In this example, the cat image is 299 pixels wide, 299 pixels tall, and has three color channels Red,Green,Blue (or RGB for short). Therefore, the image consists of 299 x 299 x 3 numbers, or a total of 268,203 numbers. Each number is an integer that ranges from 0 (black) to 255 (white). Our task is to turn this quarter of a million numbers into a single label, such as “cat”.
An example of mapping an image to class scores. For the sake of visualization, we assume the image only has 4 pixels (4 monochrome pixels, we are not considering color channels in this example for brevity), and that we have 3 classes (red (cat), green (dog), blue (ship) class). (Clarification: in particular, the colors here simply indicate 3 classes and are not related to the RGB channels.) We stretch the image pixels into a column and perform matrix multiplication to get the scores for each class. Note that this particular set of weights W is not good at all: the weights assign our cat image a very low cat score. In particular, this set of weights seems convinced that it's looking at a dog.
ASIA CCS : Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security.
Papernot et al proposed a way to deal with black-box attack.
The problem with this is that you would not be able to apply the FGSM algorithm anymore as you would not have access to the network itself.
In this paper, we introduce the first demonstration that black-box attacks against DNN classifiers are practical for real-world adversaries with no knowledge about the model.We assume the adversary (a) has no information about the structure or parameters of the DNN, and (b) does not have access to any large training dataset. The adversary’s only capability is to observe labels assigned by the DNN for chosen inputs, in a manner analog to a cryptographic oracle.
MNIST comprises 60.000 training and 10.000 test images of handwritten digits. The task associated with the dataset is to identify the digit corresponding to each image. Each 28x28 grayscale sample is encoded as a vector of pixel intensities in the interval The GTSRB dataset is an image collection consisting of 43 traffic signs [13]. Images vary in size and are RGB-encoded. To simplify, we resize images to 32x32 pixels, re center them by subtracting the mean component, and rescale them by factoring their standard deviations out. We keep 35.000 images for our training set and 4.000 for our validation set (out of the 39.209 available), and 10.000 for our test set (out of 12.630).
“We used images taken from a cell-phone camera as a input to an Inception v3 image classification neural network. We showed that in such a set-up, a significant fraction of adversarial images crafted using the original network are misclassified even when fed to the classifier through the camera.”
All previous work has assumed a threat model in which the adversary can feed data directly into the machine learning classifier. This is not always the case for systems operating in the physical world, for example those which are using signals from cameras and other sensors as input. This paper shows that even in such physical world scenarios, machinelearning systems are vulnerable to adversarial examples. We demonstrate this by feeding adversarial images obtained from a cell-phone camera to an ImageNet Inception classifier and measuring the classification accuracy of the system. We find that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera
Personal assistants such as Alexa, Siri, or Cortana are widely deployed these days. Such Automatic Speech Recognition (ASR) systems can translate and even recognize spoken language and provide a written transcript of the spoken language. Recent advances in the fields of deep learning and big data analysis supported significant progress for ASR systems and have become almost as good at it as human listeners.
Personal assistants such as Alexa, Siri, or Cortana are widely deployed these days. Such Automatic Speech Recognition (ASR) systems can translate and even recognize spoken language and provide a written transcript of the spoken language. Recent advances in the fields of deep learning and big data analysis supported significant progress for ASR systems and have become almost as good at it as human listeners.
Psychoacoustics: Human hearing is limited to a certain range of frequencies, amplitudes, and signal dynamics. The field of psychoacoustics addresses such hearing restrictions and provides a rule set for things that humans can and cannot hear. While these rules are used in different fields, e.g., in MP3 music compression, we can also utilize psychoacoustics for our attack to hide noise in such a way that humans (almost) cannot hear it.
Hearing Models
Psychoacoustic hearing thresholds describe masking effects in human acoustic perception. Probably the best-known example for this is MP3 compression, where the compression algorithm uses a set of computed hearing thresholds to find out which parts of the input signal are imperceptible for human listeners. By removing those parts, the audio signal can be transformed into a smaller but lossy representation, which requires less disk space and less bandwidth to store or transmit.
For the attack, in principle, we use the very the same algorithm as for the training of neural networks. The algorithm is based on gradient descent, but instead of updating the parameters of the neural network as for training, the audio signal is modified. We use the hearing thresholds to avoid changes in easily perceptible parts of the audio signal.
The main difference between original audio and raw audio is that the original audio does not change during the run-time of the algorithm, but the raw audio is updated iteratively in order to result in an adversarial example.
The algorithm uses the original audio signal and the target transcription as inputs in order to find the best target pseudo-posteriors. The forced alignment is performed once at the beginning of the algorithm.
The hearing thresholds are applied during the backpropagation in order to limit the changes that are perceptible by a human.
One major problem of attacks against ASR systems is that they require the recognition to pass through a certain sequence of HMM states in such a way that it leads to the target transcription. However, due to the decoding step— which includes a graph search—for a given transcription, many valid pseudo-posterior combinations exist. For example, when the same text is spoken at different speeds, the sequence of the HMM states is correspondingly faster or slower. We can benefit from this fact by using that version of pseudo-posteriors which best fits the given audio signal and the desired target transcription.
We use forced alignment as an algorithm for finding the best possible temporal alignment between the acoustic signal that we manipulate and the transcription that we wish to obtain. This algorithm is provided by the Kaldi toolkit. Note that it is not always possible to find an alignment that fits an audio file
to any target transcription. In this case, we set the alignment by dividing the audio sample equally into the number of states and set the target according to this division.
Forced alignment refers to the process by which orthographic transcriptions are aligned to audio recordings to automatically generate phone level segmentation.
The main difference between “original audio” and “raw audio” is that the original audio does not change during the run-time of the algorithm, but the raw audio is updated iteratively in order to result in an adversarial example.
For the attack, in principle, we use the very the same algorithm as for the training of neural networks. The algorithm is based on gradient descent, but instead of updating the parameters of the neural network as for training, the audio signal is modified. We use the hearing thresholds to avoid changes in easily perceptible parts of the audio signal.
NO Need to Worry about Adversarial Examples in Object Detection in Autonomous Vehicles, CVPR 2017
Synthesizing Robust Adversarial Examples, ICML 2018