Scis2015 ruo ando_2015-01-20-01

A task decomposition based
concurrent parser
for large scale code checking
SCIS2015 暗号と情報セキュリティシンポジウム
1B1 実装(1) 1月20日(火) 14:30--16:10
安藤類央
情報通信研究機構
ネットワークセキュリティ研究所

概要： towards lightweight and scalable code checker
本論文では、大規模なソースコードから脆弱性を発見するための並列パーサシステムの実装評価を行った。
並列化には、フロントエンドでは、pthreadを用いたタスク並列化の手法を適用し、バックエンドではKey-
valueを処理データ構造として持つDocument Dasebaseを用いて、ドメイン固有知識と中間表現の分離を
(lightweight)を行い、大規模なファイル処理に耐えるシステム(scalability)を構築する。
■背景：脆弱性の性質の変化と検査方法のトレンド
検査手法の性質の二極化。モデル駆動方式によるプロトコル脆弱性の解析と、広範なテキストファイルを処
理するシステム脆弱性の解析
■スケーラビリティとFalse Negative：
タスク並列化による大量のファイル処理の高速化
Key-Valueによる検査対象コードと脆弱性情報の中間表現の構築
NoSQLの適用によるアーキテクチャ面でのスケーラビリティの向上
■脆弱性攻撃検出・監査システムの開発スパンの短縮：
ドメイン固有知識と中間表現の分離による複数ファイルに跨る脆弱性検査への対応
■評価実験では、CVE-2013-4371（realloc)の脆弱性を孕むhypervisor xen-4.0.4, xen-4.1.0, xen-4.1.2の脆弱
性を検出
し、逐次処理と比較して６倍～１５倍の高速化を実現した。

背景と設計方針 (Scalability vs False Negatives)
In designing vulnerability checker, we face the difficult choice between
precision and scalability. Particularly, security system design is forced
to emphasize either false negatives or false positives. In todayfs large
scale computing era, we conclude that a false negative rate should be
as close to 0 as possible.
As of January 2013, GitHub had grown to 3 million users and
4.9 million repositories (repositories are histories of code
shared on the site). [9] And by December of this year, the
company hit 10 million repositories.
http://slideplayer.us/slide/703331/

Long term trend (検査方式の進化）
プロトコル検証
メモリ関連脆弱性
オーバーフロー系
ルール・
攻撃ペイロード自動
生成
マッチング
GREP
データベース
中間表現
モデル検査
Symbolic
Execution
メモリ関連脆弱性
DOS攻撃
テンプレート
関数型表現
■脆弱性の性質の変化と検査方法のトレンド
検査手法の性質の二極化。モデル駆動方式によるプロトコル
脆弱性の解析と、広範なストリームデータ・テキストファイルを
処理するシステム脆弱性の解析
ITS4
ACSAC 2000
ForNox
Hot SDN 2012
Computational
Verification (proverif)
CCS 2012
F7 verification
CCS 2010
MOPS (2)
CCS 2004
MC Meta-Level
Compilation
OSDI 2000
COTS (ROP)
Usenix 2013
Proverif
SSP 2006
SLAM
POPL 2002
AEG
NDSS 2012
Chimera
Usenix Sec 2012
CHUCKY
CCS 2013

Long term trend (検査方式と問題領域）
ITS4
ACSAC 2000
MOPS
CCS 2002
MC Meta-Level
Compilation
OSDI 2000
MACE
Concolic Execution
USENIX SEC 2011
COTS (ROP)
Usenix 2013
Automation
NDSS 2000
Format String
USENIX SEC 2001
MOPS (2)
CCS 2004
MetaSymsploit
USENIX SEC 2013
CHUCKY
CCS 2013
Computational
Verification (proverif)
CCS 2012
ConfAid
OSDI 2011
Metal Compiler
Extention
SSP 2002
SLAM
POPL 2002
ForNox
Hot SDN 2012
Dowser
USENIX SEC 2013
F7 verification
CCS 2010
StackGuard
USENIX SEC 1998
Branch Tracing (ROP)
Usenix Sec 2013
Proverif
SSP 2006 プロトコル検証の精緻化
複合型
設定整合性
攻撃手法の
迅速化への
対応

モデル検査の問題点（スケーラビリティと複数ファイル）
FILE1 FILE2 FILE3
オープンソース（Linux）の行数の急増
2003: Linux 2.6の行数 5929913(600万行）
2014: Linux 3.3 の行数 1499,8551 (1400万行）
STATE1
MOPS (2)
CCS 2004
STATE2
STATE3
モデル検査は、複数のファイルに跨る状態遷移
は人手で作成する必要がある。また、スケーラビ
リティに問題がある。
permission recv
parse
http://www.ibm.com/developerworks/jp/linux/library/l-33linuxkernel/

提案方式：A task decomposition based concurrent parser
●並列化手法
■データ分割
■タスク並列
●アルゴリズム
マスタ・ワーカ方式による分割統治法
本手法では、１）ソースコードの規模や構造があらかじめ同定できない、２）データ分割のためのテーブル作成の
コストが高いとの想定から、タスク並列化を用いる。
pthr e ad mut ex ini t (&r e s u l t .mutex , NULL
/ g ene r a t ing thr eads /
pthr e ad c r e a t e (&master , NULL, ( void
) mas ter func , ( void )&targ [ 0 ] ) ;
f o r ( i = 1 ; i < thread num ; ++i )
pthr e ad c r e a t e (&worker [ i ] , NULL
, ( void ) worker func , ( void
)&targ [ i ] ) ;
/ wai t ing f o r thr eads to be
f i n i s h e d /
f o r ( i = 1 ; i < thread num ; ++i )
p t h r e a d j o i n ( worker [ i ] , NULL) ;

検査方法の分類
■構文主導型 (Syntax Directed Translation)
- This translator consists of a parser (or grammar) with embedded actions that immediately generate output.
正規表現、有限オートマトン
ITS4: a static vulnerability scanner for C and C++ code, Computer Security Applications, ACSAC 2002
Chucky: exposing missing checks in source code for vulnerability discovery ccs 2013
■ルール方式 (Rule Based Translation)
- Rule-based translators use the DSL of a particular rule engine to specify a set of “this goes to that”
translation rules.
遷移規則、プッシュダウンオートマトン
Using programmer-written compiler extensions to catch security holes SSP 2002
Checking system rules using system-specific, programmer-written compiler extensions OSDI 2000
■モデル駆動方式 (Model Driven Translation)
- From the input model, a translator can emit output directly, build up strings, build up templates (documents
with “holes” in them where we can stick values), or build up specialized output objects
モデル検査・実行系
MOPS: an infrastructure for examining security properties of software CCS2002
Chucky: exposing missing checks in source code for vulnerability discovery ccs 2013

３つの検査方法：データ表現形式とドメイン固有知識
検査対象
プログラム
状態遷移式
ルール
モデルチェッカー
実行系
データベース
正規表現
述語論理式定理証明系
中間表現
トランスレータ・実行系
CFG
モデル駆動
方式
ルール方式
ボトルネック：セキュリティに
関するドメイン固有知識
構文主導方式

提案手法１
Main Loop
Lexer
NFA（有限オートマトン）
PDA(プッシュダウンオートマトン）
Token Analyzer
Block Handler
識別子（制御文、メモリ操作命令など）
の検出と処理
ブロック文（繰り返し、
分岐）のネスト管理
Document
Database
脆弱性に
関する
ドメイン
固有知識
Key-Value
形式による
検索要求
Saturator-1
lightweight code checker with document database
https://github.com/RuoAndo/Saturator-1
Key-Value(JSON)
による中間表現の
構築
Iteration for each token
構文主導
方式
ルール
方式

提案手法２
Main Loop
Lexer
Token Analyzer
Block Handler
の検出と処理
Document
Database
脆弱性に
関する
ドメイン
固有知識
Key-Value
形式による
検索要求
Saturator-1
Key-Value(JSON)
による中間表現の
構築
脆弱
固有知識
の排除
ルール
方式
構文主導
方式

提案手法３
Main Loop
Lexer
Token Analyzer
Block Handler
の検出と処理
Saturator-1
switch (charatyp[ch]) f
case Letter:
for ( ; charatyp[ch]==Letter ||
charatyp[ch]==Digit;
ch=nextCh())
if (p < p 16) p++ = ch;
p = '0'
if(strcmp(tkn.text, “for")==0)
Document Database
処理系の状態情報
（プログラム中の位置など）
問い合わせ
格納

評価実験 CVE-2013-4371
12
{"_id" : ObjectId("53f9ec4764e21cef244d69fb"), "
located" : "402", "functionName" : "
libxl_list_cpupool", "functionLine" : "388", "
filename" : "libxl.c“}
34
{"_id" : ObjectId("53f9ec9464e21cef244d6a0e"), "
start_line" : "398", "end_line" : "420", "
functionName" : "libxl_list_cpupool", "
functionLine" : "388", "filename" : "libxl.c“}
realloc
{"_id" : ObjectId("53d291fe40c2acf65bbbf9f7"), "located" : "145
"functionName" : "xc_vcpu_setaffinity", "functionLine" : "116", "filename" :
"xc_domain.c" }
Use-after-free vulnerability in the libxl_list_cpupool function in the libxl toolstack library in Xen 4.2.x and 4.3.x, when
running "under memory pressure," returns the original pointer when the realloc function fails, which allows local users
to cause a denial of service (heap corruption and crash) and possibly execute arbitrary code via unspecified vectors.
http://www.cvedetails.com/cve/CVE-2013-4371/
We compiled our system on ubuntu12 LTS with Linux kernel
3.2.0. proposed system is hosted on Intel Xeon E5645 with 2.4
GHZ clock.
version forloop realloc functions real user sys real user sys
4.0.4 5438 76 13143m41.925s 0m9.213s 0m22.837 0m17.817s 0m2.880s 0m0.328s
4.1.0 5579 80 13735m35.133s 0m9.381s 0m25.002s 0m18.597 0m2.980 0m0.448
4.1.2 5547 76 13682m2.915s 0m9.301s 0m23.545s 0m18.432s 0m3.012 0m0.396
青：並列化なし赤：提案手法（タスク並列化）

まとめ： towards lightweight and scalable code checker
本論文では、大規模なソースコードから脆弱性を発見するための並列パーサシステムの実
装評
価を行った。並列化には、フロントエンドでは、pthreadを用いたタスク並列化の手法を適
用し、
バックエンドでは、Key-valueを処理データ構造として持つDocument Dasebaseを用いて、ド
メイン
固有知識と中間表現の分離を(lightweight)を行い、大規模なファイル処理に耐えるシステム
(scalability)を構築した。
■脆弱性攻撃検出・監査システムの開発スパンの短縮：
ドメイン固有知識と中間表現の分離による複数ファイルに跨る脆弱性検査への対応
■評価実験では、CVE-2013-4371（realloc)の脆弱性を孕むhypervisor xen-4.0.4,
xen-4.1.0, xen-4.1.2の脆弱性を検出し、逐次処理と比較して６倍～１５倍の高速化を実現した。
■今後の予定：個々のファイル検査の精緻化とドメイン固有知識表現の洗練化
Boost spiritによる再帰下降パーサの実装(Boost closureの導入）

Scis2015 ruo ando_2015-01-20-01

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (17)

Similar to Scis2015 ruo ando_2015-01-20-01

Similar to Scis2015 ruo ando_2015-01-20-01 (20)

More from Ruo Ando

More from Ruo Ando (20)

Scis2015 ruo ando_2015-01-20-01