Seclt dist 20200112

Neural Trojans
mini review
2020/01/12
@IIJ – 第二回サイバーセキュリティ系LT会 in 東京
Shuntaro OHNO

About Me
Shuntaro OHNO
• Twitter: @doraneko_b1f
• GitHub: @doraneko94
• Website: https://ushitora.net
 Neuro-Scientist : Ph.D student in Toyama Univ.
 Memory, Learning, Artificial Intelligence
 Data science in Python & Neuro-Simulation in Rust
 今回は、人工知能を洗脳する方法と、
その防御手法について話します。

What is “Neural Trojan”?
“We define the malicious hidden functionalities
incorporated in neural IPs by the IP vendor as Neural Trojans”
[Liu et al. 2017]
IP: Intellectual Property
[Chou et al. 2018]

Liu et al.
“Neural Trojans”
permitted
Not permitted
顧客が想定したデータ
攻撃者が用いた
訓練データ

Gu et al.
“BadNets: Identifying Vulnerabilities in the
Machine Learning Model Supply Chain”

Gu et al.
“BadNets: Identifying Vulnerabilities in the
Machine Learning Model Supply Chain”
最終conv層のactivity（オリジナル）
最終conv層のactivity（転移学習後）
Adversaryモデルをもとに、
別の画像認識課題のために転移学習
（最終全結合層のみ再学習）

Clements et al.
“Hardware Trojan Attacks on Neural
Networks”

Clements et al.
“Hardware Trojan Attacks on Neural
Networks”
Triggerによって、
適用する関数を変化させる

Zou et al.
“PoTrojan: powerful neuron-level trojan
designs in deep learning models”
T：特定のTriggerパターンが入力されたときのみ動作

Li et al.
“Hu-Fu: Hardware and Software Collaborative Attack
Framework against Neural Networks”
Wact+Winact：正常に動作
Wact only：有害な結果
Winact： Triggerにより停止する（出力が０になる）

Others
• Dai et al.
“A backdoor attack against LSTM-based text classification
systems”
 LSTMにNeural Trojanを仕込む
• Kiourti et al.
“TrojDRL: Trojan Attacks on Deep Reinforcement Learning
Agents”
 強化学習モデルにNeural Trojanを仕込む

Liu et al.
permitted
Not permitted
顧客が想定したデータ
攻撃者が用いた
訓練データ
（再掲）

Liu et al.
1. Input Anomaly Detection
 SVM, Decision Tree
 99.8% trigger detection, with 12.2% false positive
2. Re-Training
 94.1% trigger detection
 IP should be reconfigurable
3. Input Processing
 90.2% trigger detection

Liu et al.
3. Input Processing
Auto Encoder
DNN
(Trojan?)
顧客が、自身の保有しているデータで
Auto Encoderを訓練
訓練した画像の形状は保たれるが、
訓練していない画像（Trigger）は
全く別のものになる → 不発

Chou et al.
“SentiNet: Detecting Physical Attacks Against
Deep Learning Systems”
Grad-CAM（判断根拠可視化）
で、DNNがどこを見ているか
調べる。
結果に大きく影響している
パーツを特定し、
それを他の画像に付けとき、
結果を変えられるか？
変えられる → Trigger

Chou et al.
“SentiNet: Detecting Physical Attacks Against
Deep Learning Systems”
クラス改変成功率
Control の確信度
Trigger
Safe
Control：パーツの位置を隠した画像
Control の確信度が低い
→Triggerの影響というより、
重要な部分が隠れたことが問題

Conclusion
Neural Trojan は、こわい。

Advertisement
 総務省主催の、地理空間情報ハッカソン
 地理空間情報の活用法を学び、
２日でサービス開発を行います
 参加登録は connpass から！
 愛知会場： 2020年02月01日（土）～2020年02月02日（日）
 モビリティについての課題解決
 富山会場： 2020年02月08日（土）～2020年02月09日（日）
 地理空間情報を用いたゲーム開発（Unity）
 東京会場： 2020年02月15日（土）～2020年02月16日（日）
 防災についての課題解決
 沖縄会場： 2020年02月22日（土）～2020年02月23日（日）
 モビリティ・リゾテック等についての課題解決

Seclt dist 20200112

Recommended

Recommended

More Related Content

Similar to Seclt dist 20200112

Similar to Seclt dist 20200112 (20)

Recently uploaded

Recently uploaded (8)

Seclt dist 20200112