バッチ処理を早くしたい
swooleを試してみた
2019/05/24
Arcana Meetup #50
勝見 幸弘
swooleとは
• phpのエクステンション。

- composerでinstallするやつじゃなくてmakeとかするやつ
• PHPerKaigi2019でいくつか取り上げられてました。
• phpで非同期処理ができたりします。
• 試したい!
swooleのサイト
https://www.swoole.com/
swooleの導入方法
• https://github.com/swoole/swoole-src
git clone https://github.com/swoole/swoole-src.git
cd swoole-src
phpize
./configure
make && make install
の後にextensionファイルをphp.iniに追加する
• 導入自体は簡単。ただ、xdebugが入ってると警告が出たので
xdebugは外す必要がありそうです。
httpで試すのは大変そう
なので、コマンドで試します
試すコマンド
こんな感じの
テーブルレコードに
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`mail` varchar(255) NOT NULL,
`pass` varchar(255) NOT NULL,
`update_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
こんな処理
private function passHash()
{
Log::info(‘passHash’.’_start’);
while(true){
$datas = DB::table('users')->where(DB::raw('length(pass)'), '<', 50)->limit(100)->get();
if ($datas->isEmpty()) {
break;
}
foreach($datas as $data){
$datas = DB::table('users')->where('id', $data->id)->update(['pass' => Hash::make($data->pass)]);
}
}
Log::info('passHash'.'_end');
}
ざっくり説明
• とあるテーブルのとあるカラムのデータをphpでHash::makeっ
てやった後のデータで更新する処理。
private function passHash()
{
Log::info(‘passHash’.’_start’);
while(true){
$datas = DB::table('users')->where(DB::raw('length(pass)'), '<', 50)->limit(100)->get();
if ($datas->isEmpty()) {
break;
}
foreach($datas as $data){
$datas = DB::table('users')->where('id', $data->id)->update(['pass' => Hash::make($data->pass)]);
}
}
Log::info('passHash'.'_end');
}
こんな処理
とあるテーブルのデータを取得
private function passHash()
{
Log::info(‘passHash’.’_start’);
while(true){
$datas = DB::table('users')->where(DB::raw('length(pass)'), '<', 50)->limit(100)->get();
if ($datas->isEmpty()) {
break;
}
foreach($datas as $data){
$datas = DB::table('users')->where('id', $data->id)->update(['pass' => Hash::make($data->pass)]);
}
}
Log::info('passHash'.'_end');
}
44GT44KT44Gq5Yem55CG
Hash::makeでデータ更新
cHJpdmF0ZSBmdW5jdGlvbiBwYXNzSGFzaCgp
ew==
XExvZzo6aW5mbygncGFzc0hhc2gnLidfc3RhcnQnKTs=
d2hpbGUodHJ1ZSl7
JGRhdGFzID0gXERCOjp0YWJsZSgndXNlcnMnKS0+d2hlcmUoXERCOjpyYXcoJ2xlbmd0aChwYXNzKScpLCAnPCcsIDUwKS0+bGltaXQoMTAwKS0+Z2V0K
Ck7
aWYgKCRkYXRhcy0+aXNFbXB0eSgpKSB7
YnJlYWs7
fQ==
Zm9yZWFjaCgkZGF0YXMgYXMgJGRhdGEpew==
JGRhdGFzID0gXERCOjp0YWJsZSgndXNlcnMnKS0+d2hlcmUoJ2lkJywgJGRhdGEtPmlkKS0+dXBkYXRlKFsncGFzcycgPT4gXEhhc2g6Om1ha2UoJGRhd
GEtPnBhc3MpXSk7
fQ==
fQ==
XExvZzo6aW5mbygncGFzc0hhc2gnLidfZW5kJyk7
fQ==
44GT44KT44Gq5Yem55C
G
このソースはフィクションです
この処理のコマンドは件数によってかなり時間がかかりそうなの
で、これがどれだけ早くなるか試してみました。
前哨戦
データ準備(通常)
private function insertUser($count)
{
Log::info('insertUser'.'_start');
for($i = 0;$i < $count;$i++){
for($j = 0;$j < $count;$j++){
$sql = "insert into users (mail, pass) VALUES ('$i mail@mail.com', '$i abcdefghij');";
DB::statement($sql);
}
}
Log::info('insertUser'.'_end');
}
• $count=100で1万件作成。
• ローカル環境で実施(mac + docker)。
データ準備(通常)
• 大体15秒程度。
mysql> SELECT * FROM `users` ORDER BY `update_time` DESC LIMIT 1;
+-------+------------------+---------------+---------------------+
| id | mail | pass | update_time |
+-------+------------------+---------------+---------------------+
| 10000 | 99 mail@mail.com | 99 abcdefghij | 2019-05-19 11:17:57 |
+-------+------------------+---------------+---------------------+
1 row in set (0.01 sec)
mysql> SELECT * FROM `users` ORDER BY `update_time` ASC LIMIT 1;
+----+-----------------+--------------+---------------------+
| id | mail | pass | update_time |
+----+-----------------+--------------+---------------------+
| 1 | 0 mail@mail.com | 0 abcdefghij | 2019-05-19 11:17:42 |
+----+-----------------+--------------+---------------------+
1 row in set (0.00 sec)
データ準備(swoole)private function insertUserSwoole($count)
{
Log::info('insertUser'.'_start');
for($i = 0;$i < $count;$i++){
go(function () use ($i, $count) {
$mysql = $this->getMysql();
for($j = 0;$j < $count;$j++){
$statement = $mysql->prepare("insert into users (mail, pass) VALUES (‘$i $j mail@mail.com', ‘$i $j
abcdefghij');");
$result = $statement->execute();
}
});
}
Log::info('insertUser'.'_end');
}
• さっきのと同じ条件で実行。
• go()で囲まれているとこが非同期に実行される(んだと思う)。
private function getMysql()
{
$mysql = new SwooleCoroutineMySQL();
$mysql->connect([
'host' => env('DB_HOST'),
'port' => 3306,
'user' => env('DB_USERNAME'),
'password' => env('DB_PASSWORD'),
'database' => env('DB_DATABASE'),
'timeout' => 4, // 設定しといた方がよい
]);
return $mysql;
}
• 大体4秒程度。
mysql> SELECT * FROM `users` ORDER BY `update_time` DESC LIMIT 1;
+------+---------------------+------------------+---------------------+
| id | mail | pass | update_time |
+------+---------------------+------------------+---------------------+
| 9329 | 44 94 mail@mail.com | 44 94 abcdefghij | 2019-05-19 11:29:37 |
+------+---------------------+------------------+---------------------+
1 row in set (0.01 sec)
mysql> SELECT * FROM `users` ORDER BY `update_time` ASC LIMIT 1;
+----+-------------------+----------------+---------------------+
| id | mail | pass | update_time |
+----+-------------------+----------------+---------------------+
| 1 | 3 0 mail@mail.com | 3 0 abcdefghij | 2019-05-19 11:29:33 |
+----+-------------------+----------------+---------------------+
1 row in set (0.01 sec)
データ準備(swoole)
15秒→4秒
おお
すごい
では本番
ハッシュ(通常)
private function passHash()
{
Log::info('passHash'.'_start');
while(true){
$datas = DB::table('users')->where(DB::raw('length(pass)'), '<', 50)->limit(100)->get();
if ($datas->isEmpty()) {
break;
}
foreach($datas as $data){
$datas = DB::table('users')->where('id', $data->id)->update(['pass' => Hash::make($data->pass)]);
}
}
Log::info('passHash'.'_end');
}
• さっき作成した1万件で実行。
• 大体9分17秒程度。
mysql> SELECT * FROM `users` ORDER BY `update_time` DESC LIMIT 1;
+-------+--------------------+--------------------------------------------------------------+---------------------+
| id | mail | pass | update_time |
+-------+--------------------+--------------------------------------------------------------+---------------------+
| 10000 | 0 99 mail@mail.com | $2y$10$oFmEp9d6qtK5WrvUtQ4PRevs8BtFK5CcLEF04qD4Vl0TdQaAY2yMa | 2019-05-19 12:14:41 |
+-------+--------------------+--------------------------------------------------------------+---------------------+
1 row in set (0.01 sec)
mysql> SELECT * FROM `users` ORDER BY `update_time` ASC LIMIT 1;
+----+--------------------+--------------------------------------------------------------+---------------------+
| id | mail | pass | update_time |
+----+--------------------+--------------------------------------------------------------+---------------------+
| 1 | 68 0 mail@mail.com | $2y$10$JdBPfgEMn2kAJm2sotn9NObS7hHzuoZwfamx0VrrtQHuhBlxMu5rm | 2019-05-19 12:05:24 |
+----+--------------------+--------------------------------------------------------------+---------------------+
1 row in set (0.01 sec)
ハッシュ(通常)
ハッシュ(swoole)
private function passHashSwoole($count)
{
Log::info('passHashSwoole'.'_start');
$coefficient = 2;
$parent = $count / $coefficient;
$child = $count * $coefficient;
for($i = 0;$i < $parent;$i++){
$sql = DB::table('users')->limit($child);
if ($i > 0) {
$sql->offset($i * $child);
}
$datas = $sql->get();
go(function () use ($datas) {
$mysql = $this->getMysql();
foreach($datas as $data){
$pass = Hash::make($data->pass);
$id = $data->id;
$data = $mysql->query("update users set pass = '$pass' where id = $id;");
}
});
Log::info('passHashSwoole_child'.'_end');
}
Log::info('passHashSwoole'.'_end');
}
• 色々あって何とか辿りついたソース。
ハッシュ(swoole)
mysql> SELECT * FROM `users` ORDER BY `update_time` DESC LIMIT 1;
+------+---------------------+--------------------------------------------------------------+---------------------+
| id | mail | pass | update_time |
+------+---------------------+--------------------------------------------------------------+---------------------+
| 8794 | 64 97 mail@mail.com | $2y$10$dJEFSl9CikhSv.E4ICumMO3uOCqbVvK2E0uTzZWON8tPAejrIm5wq | 2019-05-19 14:54:31 |
+------+---------------------+--------------------------------------------------------------+---------------------+
1 row in set (0.01 sec)
mysql> SELECT * FROM `users` ORDER BY `update_time` ASC LIMIT 1;
+----+-------------------+--------------------------------------------------------------+---------------------+
| id | mail | pass | update_time |
+----+-------------------+--------------------------------------------------------------+---------------------+
| 1 | 5 0 mail@mail.com | $2y$10$PFUJN/mKBE3uVNtX/GpE6Oaz5I.VjEFkRxWvh2wiFStQySkdSQJcy | 2019-05-19 14:45:58 |
+----+-------------------+--------------------------------------------------------------+---------------------+
1 row in set (0.01 sec)
( ゚д゚) ・・・
(つд⊂)ゴシゴシ
• 大体8分33秒程度。
mysql> SELECT * FROM `users` ORDER BY `update_time` DESC LIMIT 1;
+------+---------------------+--------------------------------------------------------------+---------------------+
| id | mail | pass | update_time |
+------+---------------------+--------------------------------------------------------------+---------------------+
| 8794 | 64 97 mail@mail.com | $2y$10$dJEFSl9CikhSv.E4ICumMO3uOCqbVvK2E0uTzZWON8tPAejrIm5wq | 2019-05-19 14:54:31 |
+------+---------------------+--------------------------------------------------------------+---------------------+
1 row in set (0.01 sec)
mysql> SELECT * FROM `users` ORDER BY `update_time` ASC LIMIT 1;
+----+-------------------+--------------------------------------------------------------+---------------------+
| id | mail | pass | update_time |
+----+-------------------+--------------------------------------------------------------+---------------------+
| 1 | 5 0 mail@mail.com | $2y$10$PFUJN/mKBE3uVNtX/GpE6Oaz5I.VjEFkRxWvh2wiFStQySkdSQJcy | 2019-05-19 14:45:58 |
+----+-------------------+--------------------------------------------------------------+---------------------+
1 row in set (0.01 sec)
ハッシュ(swoole)
(;゚д゚) ・・・
(つд⊂)ゴシゴシゴシ
9分17秒→8分33秒
(゚´Д`゚)゚・:*:・゜’★
全然早くない
この動く処理にたどり着く
までに既に結構大変だっ
たのに。。。
GCPの良いインスタンス
で試したりとかもほぼ無駄
だった。
いろいろ調べた結果
• Cpuを使い切れてない様子。
• ローカルでも2つCpuが使えれば、せめて倍くらいの速度は出
せるはず。
top - 21:41:33 up 9:04, 0 users, load average: 0.79, 0.40, 0.22
Tasks: 15 total, 2 running, 13 sleeping, 0 stopped, 0 zombie
Cpu0 : 1.0%us, 2.3%sy, 0.0%ni, 96.3%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
実行サーバ
Swoole Process
Manager
プロセスを自分で管理で
きるやつです。
詳しく知らないので説明
はできません。
これを試してみました。
ハッシュ(複数プロセス)private function passHashSwooleProcess()
{
Log::info('passHashSwooleProcess'.'_start');
$workers = [];
$worker_num = 2;
$data_count = DB::table('users')->where(DB::raw('length(pass)'), '<', 50)->count();
for($i = 0; $i < $worker_num; $i++){
$sql = DB::table('users')->where(DB::raw('length(pass)'), '<', 50)->limit($data_count / 2);
if ($i > 0) {
$sql->offset($i * $data_count / 2);
}
$userss = $sql->get();
$process = new SwooleProcess(function($process) use ($userss){
go(function () use ($process, $userss) {
$mysql = $this->getMysql();
foreach($userss as $users){
$pass = Hash::make($users->pass);
$id = $users->id;
$data = $mysql->query("update users set pass = '$pass' where id = $id;");
}
Log::info('passHashSwooleProcess_child'.'_end');
$process->exit(0);
});
});
$pid = $process->start();
$workers[$pid] = $process;
}
Log::info('passHashSwooleProcess'.'_end');
}
private function passHashSwooleProcess()
{
Log::info('passHashSwooleProcess'.'_start');
$workers = [];
$worker_num = 2;
$data_count = DB::table('users')->where(DB::raw('length(pass)'), '<', 50)->count();
for($i = 0; $i < $worker_num; $i++){
$sql = DB::table('users')->where(DB::raw('length(pass)'), '<', 50)->limit($data_count / 2);
if ($i > 0) {
$sql->offset($i * $data_count / 2);
}
$userss = $sql->get();
$process = new SwooleProcess(function($process) use ($userss){
go(function () use ($process, $userss) {
$mysql = $this->getMysql();
foreach($userss as $users){
$pass = Hash::make($users->pass);
$id = $users->id;
$data = $mysql->query("update users set pass = '$pass' where id = $id;");
}
Log::info('passHashSwooleProcess_child'.'_end');
$process->exit(0);
});
});
$pid = $process->start();
$workers[$pid] = $process;
}
Log::info('passHashSwooleProcess'.'_end');
}
ハッシュ(複数プロセス)
プロセスを作って
ハッシュ(複数プロセス)private function passHashSwooleProcess()
{
Log::info('passHashSwooleProcess'.'_start');
$workers = [];
$worker_num = 2;
$data_count = DB::table('users')->where(DB::raw('length(pass)'), '<', 50)->count();
for($i = 0; $i < $worker_num; $i++){
$sql = DB::table('users')->where(DB::raw('length(pass)'), '<', 50)->limit($data_count / 2);
if ($i > 0) {
$sql->offset($i * $data_count / 2);
}
$userss = $sql->get();
$process = new SwooleProcess(function($process) use ($userss){
go(function () use ($process, $userss) {
$mysql = $this->getMysql();
foreach($userss as $users){
$pass = Hash::make($users->pass);
$id = $users->id;
$data = $mysql->query("update users set pass = '$pass' where id = $id;");
}
Log::info('passHashSwooleProcess_child'.'_end');
$process->exit(0);
});
});
$pid = $process->start();
$workers[$pid] = $process;
}
Log::info('passHashSwooleProcess'.'_end');
}
プロセス動かす。
private function passHashSwooleProcess()
{
Log::info('passHashSwooleProcess'.'_start');
$workers = [];
$worker_num = 2;
$data_count = DB::table('users')->where(DB::raw('length(pass)'), '<', 50)->count();
for($i = 0; $i < $worker_num; $i++){
$sql = DB::table('users')->where(DB::raw('length(pass)'), '<', 50)->limit($data_count / 2);
if ($i > 0) {
$sql->offset($i * $data_count / 2);
}
$userss = $sql->get();
$process = new SwooleProcess(function($process) use ($userss){
go(function () use ($process, $userss) {
$mysql = $this->getMysql();
foreach($userss as $users){
$pass = Hash::make($users->pass);
$id = $users->id;
$data = $mysql->query("update users set pass = '$pass' where id = $id;");
}
Log::info('passHashSwooleProcess_child'.'_end');
$process->exit(0);
});
});
$pid = $process->start();
$workers[$pid] = $process;
}
Log::info('passHashSwooleProcess'.'_end');
}
ハッシュ(複数プロセス)
終わったら殺す。
実行サーバでTOP
いけてそう!!!!
top - 21:34:34 up 1 day, 14:42, 0 users, load average: 0.54, 0.51, 0.58
Tasks: 5 total, 3 running, 2 sleeping, 0 stopped, 0 zombie
Cpu0 : 94.0%us, 3.0%sy, 0.0%ni, 2.3%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu1 : 93.0%us, 4.3%sy, 0.0%ni, 0.0%id, 2.0%wa, 0.0%hi, 0.7%si, 0.0%st
実行サーバでTOP
計測
• 大体4分39秒程度。
mysql> SELECT * FROM `users` ORDER BY `update_time` DESC LIMIT 1;
+-------+---------------------+--------------------------------------------------------------+---------------------+
| id | mail | pass | update_time |
+-------+---------------------+--------------------------------------------------------------+---------------------+
| 10000 | 91 99 mail@mail.com | $2y$10$60bx7lqjuNUys/tHUC3pOOkooM6AyN2/KabeP3flsmGWXfAn7HoRG | 2019-05-19 15:23:22 |
+-------+---------------------+--------------------------------------------------------------+---------------------+
1 row in set (0.01 sec)
mysql> SELECT * FROM `users` ORDER BY `update_time` ASC LIMIT 1;
+----+-------------------+--------------------------------------------------------------+---------------------+
| id | mail | pass | update_time |
+----+-------------------+--------------------------------------------------------------+---------------------+
| 1 | 0 0 mail@mail.com | $2y$10$h6ziNoQHFSTsMbe8qOKJ8O/FDfM80vmy21/9nt7Q6H76aV9lcC1VW | 2019-05-19 15:18:43 |
+----+-------------------+--------------------------------------------------------------+---------------------+
1 row in set (0.01 sec)
ハッシュ(swoole)
• 非同期とかプロセスとか結構大変だった。

- 気軽には勧められないかも。
• フレームワークを使えばhttpでも比較的簡単に使える。

- らしい。試してない。来世で試します。
• 今回試した奴は、コマンド2回叩けば同じことできるんじゃない
かと思う。処理工夫してサーバ複数でやる方が、非同期とかプ
ロセスとかより簡単。
swooleの感想

swooleを試してみた