A Survey on Security and Privacy of Machine Learning

1
A Survey on Security and Privacy of
Machine Learning
Dang Duy Thang
September 15th 2019

2
Big thanks to VJAI community and
BizReach company for supporting this
talk!

3
Outline
1. Introduction
2. Adversarial examples on CV
3. Adversarial examples on NLP
4. Adversarial examples on Speech
5. Conclusion

4
Introduction
What can AI learn from security?
• History has shown attacker always follows footsteps of
new technology development.
• As AI controls more and more systems, attacker will
have higher & higher chances.

5
Failures of ML
Microsoft’s AI chatbot
The bot was Artificial Intelligence
Chatterbot that created by Microsoft, and
named "Tay" after the acronym "thinking
about you".
There are similar platforms as Xiaoice in
China and Rinna in Japan.
• Tay was designed to mimic the language patterns of a 19-year-old
American girl, and to learn from interacting with human users of
Twitter.
• The system collected vast amounts of intimate details on
individuals, the program raises privacy questions.

6
Failures of ML
Microsoft’s AI chatbot
Microsoft had to shut down “TAY” only
16 hours after its launch on March 23,
2016 because it started tweeting Racist
messages such as:
“Tay accused George W. Bush of causing 9/11, praise
Hitler, refered to President Barack Obama as a “monkey”
Tay's behavior on a "coordinated attack by a subset of
people" that "exploited a vulnerability in Tay.

7
What is the adversarial examples?
In the paper “Explaining and harnessing Adversarial
Examples”, Goodfellow et al, International Conference on
Learning Representations 2015 :
“Adversarial examples is that ML models misclassify examples
that are only slightly different from correctly classified examples
draw from the data distribution”.

8
Adversarial Examples
Research fields are effected by Adversarial Examples
1. Computer vision
2. Natural language processing
3. Speech processing

9
Adversarial Examples is a big problem?
https://nicholas.carlini.com/writing/2019
/all-adversarial-example-papers.html
NeurIPS - Workshop on Security in Machine
Learning 2019
ICML - Workshop on the Security and Privacy
of Machine Learning 2019
ICLR - Safe Machine Learning Specification,
Robustness and Assurance Workshop 2019
CVPR - The Bright and Dark Sides of Computer
Vision: Challenges and Opportunities for
Privacy and Security Workshop 2019
https://github.com/IBM/adversar
ial-robustness-toolbox https://github.com/tensorflow/cleverhans

10
Adversarial Examples
There are two types of adversarial examples attacks:
1. White-box attacks
2. Black-box attacks
Or we can categorize AE as output type:
1. Untargeted attack
2. Targeted attack

11
Image Classification
http://cs231n.github.io
Each number is an integer that ranges
from 0 (black) to 255 (white)

12
Interpreting a linear classifier
Score function that maps the raw image pixels
to class scores
Higher score means better, or ('more likely')
Loss function: measures quality of outcomes
the loss will be high if we’re doing a poor classification, and it
will be low if we’re doing well.

13
72 136 56 103
61 52 134 42
142 210 134 99
150 46 81 76
72 136 56 201
61 52 134 172
142 210 134 81
48 35 110 200
How AI system recognizes a RGB Image?
Input RGB Image
“vending machine”
Red channel
Green channel
Blue channel
72 136 56 80
61 52 134 97
142 210 134 178
36 79 63 85
What a computer sees
Three matrices correspond to three channels
3つの行列が3つのチャンネルに対応

14
72 136 56 103
61 52 134 42
142 210 134 99
150 46 81 76
72 136 56 201
61 52 134 172
142 210 134 81
48 35 110 200
72 136 56 80
61 52 134 97
142 210 134 178
36 79 63 85 Flatten
72
136
56
80
…
…
201
172
81
…
…
103
42
99
…
3 matrices
Vector
vectorization

15
72
136
56
…
…
…
…
…
201
…
…
…
103
…
…
Vector Input Layer
Hidden Layer
Output Layer
thatch
keyboard
…

16
AI Model
Let’s consider the Google Inception V3 that was
trained on ImageNet dataset with 1,000 classes. Input
image is a color image with 299x299 size.
Input RGB Image 299 x 299
72 136 56 103
61 52 134 42
142 210 134 99
150 46 81 76
72 136 56 201
61 52 134 172
142 210 134 81
48 35 110 200
72 136 56 80
61 52 134 97
142 210 134 178
36 79 63 85
299 x 299 x3
1x268,203
scaling
Vectorization
72
136
56
80
…
…
201
172
81
…
…
103
42
99
…

17
Normal process
In a normal process to train a neural network:
ニューラルネットワークを訓練する通常のプロセ
スでは
Fix, no change
固定、変更なし
Fine-tune W respect to Loss(input, label)
Try to find the best W that makes Loss function is as small as possible
損失に対するWの微調整（入力、ラベル）
損失関数ができるだけ小さくなるような最良のWを見つけようとする
label

18
Crafting Adversarial Examples
In a process to create adversarial examples:
敵対的な例を作成する過程で
Fine-tune input respect to Loss(input, label)
Keep Loss(input, label) is large
Keep Loss(adversarial, new label) is small
label
Fix, no change
固定、変更なし
損失に対する入力の微調整（入力、ラベル）
（入力、ラベル)の損失は大きく、
（敵対的、新しいラベル)の損失は小さくする

19
Adversarial Attack
Input vector
AI System
46
140
37
92
…
…
118
37
41
…
…
90
29
30
…
Adversarial Vector
72 136 56 90
61 52 134 29
142 210 134 30
150 46 81 76
72 136 56 118
61 52 134 37
142 210 134 41
48 35 110 200
46 140 37 92
61 52 134 97
142 210 134 178
36 79 63 85
299 x 299 x3
72
136
56
80
…
…
201
172
81
…
…
103
42
99
…
Adversarial Image
reform

20
Back to FGSM (Fast Gradient Sign Method)
FGSM (Fast Gradient Sign Method)*:
(*) Ian Goodfellow et al, Explaining and harnessing adversarial examples, ICLR 2015.
72
136
56
80
…
…
201
172
81
…
…
103
42
99
…
46
140
37
92
…
…
118
37
41
…
…
90
29
30
…
= + adversarial perturbation

21
Adversarial Examples is actually dangerous?
Paper: Fooling automated surveillance cameras:
adversarial patches to attack person detection, CVPR
2018

22
Adversarial Examples is actually dangerous?
Autonomous car are built on ML/AI for auto detect traffic
sign?
What happen if that car’s AI system does misrecognition
“STOP” traffic sign instead of “turn right” sign when it is
running in a high way?
Taesoo Kim, AI & Security, Microsoft Research Faculty Summit 2017

23
However, white-box attack is not really realistic
because you’re unlikely to get access to the
gradients of the loss function on any particular
system.
-> The paper from Papernot et al, “Practical Black-
Box Attacks against Machine Learning”, ASIA CCS
2017.
Adversarial Samples –
Black-box attacks

24
 In a more realistic context, you would want to
attack a system having only access to its
outputs.
 In the paper “Practical Black-Box Attacks
against Machine Learning” proposed a
substitute Deep Neural Networks to solve the
same classification tasks as the target model.
Black-box attacks

25
There are two main strategies in black-box attacks:
1. Create a synthetic dataset.
2. Build and Train a local substitute Deep Neural
Networks.
Black-box attacks

26
How to create a synthetic dataset (input, output)?
1. Inputs: are synthetic and generated by the
adversary.
2. Outputs: are labels assigned by the target DNN
and observed by the adversary.
Black-box attacks

27
How to build a local substitute Deep Neural Networks?
The attacker queries the oracle (target system) with
synthetic inputs selected by a Jacobian-based
heuristic to build a model F approximating the
oracle model O’s decision boundaries.
Black-box attacks

28
What are the Oracle DNNs?:
1. Amazon oracle: https://aws.amazon.com/machine-
learning
2. Google oracle: https://cloud.google.com/prediction
3. MetaMind oracle: https://einstein.ai/
They provide some functionalities: dataset upload,
automated model training and model prediction
querying.
This method made MetaMind misclassify at 84.24%,
Amazon and Google at 96.19% and 88.94%.
Black-box attacks

29
In an another real scenario, the attacker would not be
allowed to provide its own image files, the neural
network would take camera pictures as input. That’s
the problem, the authors of this paper “Adversarial
examples in the physical world”(ICLR, 2017) are
trying to solve.
Black-box attacks

30
“We used images taken from a cell-phone camera as a input
to an image classification neural network”
Black-box attacks

31
Adversarial Examples in Natural
Language Processing

32
Adversarial Examples on NLP
 Paper: Deep Text Classification Can be Fooled, Bin Liang et al,
International Joint Conference on Artificial Intelligence, 2018.
 The foundation of attack lies in identifying the text items that
possess significant contribution to the classification by leveraging
the cost gradient.
 They determine the most frequent phrases, called Hot Training
Phrases (HTPs).

33
 The character-level DNN is trained on a DBpedia ontology dataset,
which contains 560,000 training samples and 70,000 testing
samples of 14 high-level classes, such as Company, Building, Film
and so on.

34
 The MR dataset is a movie review repository (containing 10,662
reviews) while CR contains 3,775 reviews about products, e.g. a
music player. Reviews from both datasets can be categorized as
either Positive or Negative.

35
Related papers
 Generating Natural Language Adversarial Examples, Moustafa et
al,. EMNLP 2018.
 TEXTBUGGER: Generating Adversarial Text Against Real-world
Applications, NDSS 2019

36
Adversarial Examples on Speech Processing

37
Introduction
Personal assistants such as Alexa, Siri, or Cortana are widely deployed
these days. Such Automatic Speech Recognition (ASR) systems can
translate and even recognize spoken language and provide a written
transcript of the spoken language. Recent advances in the fields of
deep learning and big data analysis supported significant progress for
ASR systems and have become almost as good at it as human
listeners.
What is Automatic Speech Recognition (ASR) system?

38
Automatic Speech Recognition (ASR)
Paper: Adversarial Attacks Against Automatic Speech
Recognition Systems via Psychoacoustic Hiding, Lea et al.,
NDSS 2019

39
How can we attack ASR systems?
Psychoacoustics: Human hearing is limited to a certain range
of frequencies, amplitudes, and signal dynamics. The field
of psychoacoustics addresses such hearing restrictions and
provides a rule set for things that humans can and cannot
hear. While these rules are used in different fields, e.g.,
in MP3 music compression, we can also utilize
psychoacoustics for our attack to hide noise in such a way
that humans (almost) cannot hear it.

40
Psychoacoustics
Hearing Models
Psychoacoustic hearing thresholds describe masking effects
in human acoustic perception. Probably the best-known
example for this is MP3 compression, where the
compression algorithm uses a set of computed hearing
thresholds to find out which parts of the input signal are
imperceptible for human listeners. By removing those
parts, the audio signal can be transformed into a smaller
but lossy representation, which requires less disk space and
less bandwidth to store or transmit.

41
Attack
For the attack, in principle, we use the very the same algorithm as for
the training of neural networks. The algorithm is based on gradient
descent, but instead of updating the parameters of the neural
network as for training, the audio signal is modified. We use the
hearing thresholds to avoid changes in easily perceptible parts of the
audio signal.
https://adversarial-attacks.net/
Kaldi Speech Recognition Toolkit
https://github.com/kaldi-asr/kaldi

42
Attack Model
HMM: Hidden
Markov Model

43
Attack Model
Forced alignment refers to the process by which
orthographic transcriptions are aligned to audio
recordings to automatically generate phone level
segmentation.
The main difference between “original audio” and
“raw audio” is that the original audio does not
change during the run-time of the algorithm, but
the raw audio is updated iteratively in order to
result in an adversarial example.

44
Results
Original Audio:
THE WALLOP PROPOSAL WOULD COST FIVE POINT FOUR TWO
BILLION DOLLARS OVER FIVE YEARS.
Adversarial Audio:
I BELIEVE ALL PEOPLE ARE GOOD
Noise:

45
Related papers
Imperceptible, Robust, and Targeted Adversarial
Examples for Automatic Speech Recognition, Yao
Qin et al., ICML 2019.
Robust Audio Adversarial Example for a Physical
Attack, Hiromu Yakura and Jun Sakuma, IJCAI 2019

46
Back to AE in Computer Vision
There is no way to protect AI model from
Adversarial Examples?

47
Good news from CVPR 2017….
Paper: NO Need to Worry about Adversarial Examples in
Object Detection in Autonomous Vehicles, Jiajun Lu et al.,
CVPR 2017.

48
But ….
Paper: Obfuscated Gradients Give a False Sense of Security:
Circumventing Defenses to Adversarial Examples, Athalye
et al., ICML 2018 (Best Paper).

49
Demonstration
• I will make demonstration of adversarial
examples on image classification tasks
by using pytorch and tensorflow with
several state-of-the-art AI models
(Inception V3-4, Resnet-152 …)

50
Future work
 In this talk I cover about adversarial attacks on image
classification, natural language processing and speech
recognition.
 Adversarial examples are raising as a very crucial problem in
deep learning.
 Defense strategies are not included in this talk, hope I have
another chance to discuss about it. 
Q & A

51
Supplements
Adversarial examples github:
https://github.com/tensorflow/cleverhans
https://github.com/IBM/adversarial-
robustness-toolbox

52
ご清聴ありがとうございました

A Survey on Security and Privacy of Machine Learning

Recommended

Recommended

More Related Content

Similar to A Survey on Security and Privacy of Machine Learning

Similar to A Survey on Security and Privacy of Machine Learning (20)

Recently uploaded

Recently uploaded (20)

A Survey on Security and Privacy of Machine Learning

Editor's Notes