SlideShare a Scribd company logo
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Better CSV processing
with Ruby 2.6
Kouhei Sutou/Kazuma Furuhashi
ClearCode Inc./Speee, Inc.
RubyKaigi 2019
2019-04-19
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Ad: Silver sponsor
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Ad: Cafe sponsor
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Kouhei Sutou
The president of ClearCode Inc.
クリアコードの社長
✓
A new maintainer of the csv library
csvライブラリーの新メンテナー
✓
The founder of Red Data Tools project
Red Data Toolsプロジェクトの立ち上げ人
Provides data processing tools for Ruby
Ruby用のデータ処理ツールを提供するプロジェクト
✓
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Kazuma Furuhashi
A member of Asakusa.rb / Red Data Tools
Asakusa.rb/Red Data Toolsメンバー
✓
Worikng at Speee Inc.
Speeeで働いている
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
csv in Ruby 2.6 (1)
Ruby 2.6のcsv(1)
Faster CSV parsing
CSVパースの高速化
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Unquoted CSV
クォートなしのCSV
AAAAA,AAAAA,AAAAA
...
2.5 2.6 Faster?
432.0i/s 764.9i/s 1.77x
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Quoted CSV
クォートありのCSV
"AAAAA","AAAAA","AAAAA"
...
2.5 2.6 Faster?
274.1i/s 534.5i/s 1.95x
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Quoted separator CSV (1)
区切り文字をクォートしているCSV(1)
",AAAAA",",AAAAA",",AAAAA"
...
2.5 2.6 Faster?
211.0i/s 330.0/s 1.56x
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Quoted separator CSV (2)
区切り文字をクォートしているCSV(2)
"AAAAArn","AAAAArn","AAAAArn"
...
2.5 2.6 Faster?
118.7i/s 325.6/s 2.74x
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Quoted CSVs
クォートありのCSV
2.5 2.6
Just quoted 274.1i/s 554.5i/s
Include sep1 211.0i/s 330.0i/s
Include sep2 118.0i/s 325.6i/s
(Note) (Slow down) (Still fast)
Note: "Just quoted" on 2.6 is optimized
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Multibyte CSV
マルチバイトのCSV
あああああ,あああああ,あああああ
...
2.5 2.6 Faster?
371.2i/s 626.6i/s 1.69x
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
csv in Ruby 2.6 (2)
Ruby 2.6のcsv(2)
Faster CSV writing
CSV書き出しの高速化
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
CSV.generate_line
n_rows.times do
CSV.generate_line(fields)
end
2.5 2.6 Faster?
284.4i/s 684.2i/s 2.41x
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
CSV#<<
output = StringIO.new
csv = CSV.new(output)
n_rows.times {csv << fields}
2.5 2.6 Faster?
2891.4i/s 4824.1i/s 1.67x
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
CSV.generate_line vs. CSV#<<
2.5 2.6
generate_
line
284.4i/s 684.2i/s
<< 2891.4i/s 4824.1i/s
Use << for multiple writes
複数行書き出すときは<<を使うこと
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
csv in Ruby 2.6 (3)
Ruby 2.6のcsv(3)
New CSV parser
for
further improvements
さらなる改良のための新しいCSVパーサー
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Benchmark with KEN_ALL.CSV
KEN_ALL.CSVでのベンチマーク
01101,"060 ","0600000","ホッカイドウ","サッポロシチュウオウク",...
...(124257 lines)...
47382,"90718","9071801","オキナワケン","ヤエヤマグンヨナグニチョウ",...
Zip code data in Japan
日本の郵便番号データ
https://www.post.japanpost.jp/zipcode/download.html
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
KEN_ALL.CSV statistics
KEN_ALL.CSVの統計情報
Size(サイズ) 11.7MiB
# of columns(列数) 15
# of rows(行数) 124259
Encoding(エンコーディング) CP932
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Parsing KEN_ALL.CSV
KEN_ALL.CSVのパース
CSV.foreach("KEN_ALL.CSV",
"r:cp932") do |row|
end
2.5 2.6 Faster?
1.17s 0.79s 1.48x
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Fastest parsing in pure Ruby
Ruby実装での最速のパース方法
input.each_line(chomp: true) do |line|
line.split(",", -1) do |column|
end
end
Limitation: No quote
制限:クォートがないこと
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
KEN_ALL.CSV without quote
クォートなしのKEN_ALL.CSV
01101,060 ,0600000,ホッカイドウ,サッポロシチュウオウク,...
...(124257 lines)...
47382,90718,9071801,オキナワケン,ヤエヤマグンヨナグニチョウ,...
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Optimized no quote CSV parsing
最適化したクォートなしCSVのパース方法
CSV.foreach("KEN_ALL_NO_QUOTE.CSV",
"r:cp932",
quote_char: nil) {|row|}
split 2.6 Faster?
0.32s 0.37s 0.86x
(almost the same/同等)
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Summary: Performance
まとめ:性能
Parsing: 1.5x-3x faster
パース:1.5x-3x高速
Max to the "split" level by using an option
オプションを指定すると最大で「split」レベルまで高速化可能
✓
✓
Writing: 1.5x-2.5x faster
書き出し:1.5x-2.5x高速
Use CSV#<< than CSV.generate_line
CSV.generate_lineよりもCSV#<<を使うこと
✓
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
How to improve performance (1)
速度改善方法(1)
Complex quote
複雑なクォート
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Complex quote
複雑なクォート
"AA""AAA"
"AA,AAA"
"AArAAA"
"AAnAAA"
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Use StringScanner
StringScannerを使う
String#split is very fast
String#splitは高速
✓
String#split is naive for complex quote
String#splitは複雑なクォートを処理するには単純過ぎる
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
2.5 uses String#split
in_extended_column = false # "...n..." case
@input.each_line do |line|
line.split(",", -1).each do |part|
if in_extended_column
# ...
elsif part.start_with?('"')
if part.end_with?('"')
row << pars.gsub('""', '"') # "...""..." case
else
in_extended_column = true
end
# ...
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
split: Complex quote
Just quoted 274.1i/s
Include sep1 211.0i/s
Include sep2 118.0i/s
Slow down
遅くなる
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
2.6 uses StringScanner
row = []
until @scanner.eos?
value = parse_column_value
if @scanner.scan(/,/)
row << value
elsif @scanner.scan(/n/)
row << value
yield(row)
row = []
end
end
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
parse_column_value
def parse_column_value
parse_unquoted_column_value ||
parse_quoted_column_value
end
Compositable components
部品を組み合わせられる
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
parse_unquoted_column_value
def parse_unquoted_column_value
@scanner.scan(/[^,"rn]+/)
end
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
parse_quoted_column_value
def parse_quoted_column_value
# Not quoted
return nil unless @scanner.scan(/"/)
# Quoted case ...
end
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Parse methods can be composited
パースメソッドを組み合わせられる
def parse_column_value
parse_unquoted_column_value ||
parse_quoted_column_value
end
Easy to maintain
メンテナンスしやすい
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Point (1)
ポイント(1)
Use StringScanner for complex case
複雑なケースにはStringScannerを使う
✓
StringScanner for complex case:
複雑なケースにStringScannerを使うと:
Easy to maintain
メンテナンスしやすい
✓
No performance regression
性能が劣化しない
✓
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
StringScanner: Complex quote
Just quoted 554.5i/s
Include sep1 330.0i/s
Include sep2 325.6i/s
No slow down...?
遅くなっていない。。。?
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
How to improve performance (2)
速度改善方法(2)
Simple case
単純なケース
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Simple case
単純なケース
AAAAA
"AAAAA"
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Use String#split
String#splitを使う
StringScanner is
slow
for simple case
StringScannerは単純なケースでは遅い
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Fallback to StringScanner impl.
StringScanner実装にフォールバック
def parse_by_strip(&block)
@input.each_line do |line|
if complex?(line)
return parse_by_string_scanner(&block)
else
yield(line.split(","))
end
end
end
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Quoted CSVs
クォートありのCSV
StringScanner split +
StringScanner
Just quoted 311.7i/s 523.4i/s
Include sep1 312.9i/s 309.8i/s
Include sep2 311.3i/s 312.6i/s
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Point (2)
ポイント(2)
First try optimized version
まず最適化バージョンを試す
✓
Fallback to robust version
when complexity is detected
複雑だとわかったらちゃんとしたバージョンにフォールバック
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
How to improve performance (3)
速度改善方法(3)
loop do
↓
while true
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
loop vs. while
How Throughput
loop 377i/s
while 401i/s
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Point (3)
ポイント(3)
while doesn't create a new scope
whileは新しいスコープを作らない
✓
Normally, you can use loop
ふつうはloopでよい
Normally, loop isn't a bottle neck
ふつうはloopがボトルネックにはならない
✓
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
How to improve performance (4)
速度改善方法(4)
Lazy
遅延
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
CSV object is parser and writer
CSVオブジェクトは読み書きできる
2.5: Always initializes everything
2.5:常にすべてを初期化
✓
2.6: Initializes when it's needed
2.6:必要になったら初期化
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Write performance
2.5 2.6 Faster?
generate_
line
284.4i/s 684.2i/s 2.41x
<< 2891.4i/s 4824.1i/s 1.67x
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
How to initialize lazily
初期化を遅延する方法
def parser
@parser ||= Parser.new(...)
end
def writer
@writer ||= Writer.new(...)
end
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Point (4)
ポイント(4)
Do only needed things
必要なことだけする
✓
One class for one feature
機能ごとにクラスを分ける
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
New features by new parser
新しいパーサーによる新機能
Add support for " escape
"でのエスケープをサポート
✓
Add strip: option
strip:オプションを追加
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
" escape
"でのエスケープ
CSV.parse(%Q["a""bc","a"bc"],
liberal_parsing: {backslash_quote: true})
# [["a"bc", "a"bc"]]
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
strip:
strip:
CSV.parse(%Q[ abc , " abc"], strip: true)
# [["abc", " abc"]]
CSV.parse(%Q[abca,abc], strip: "a")
# [["bc", "bc"]]
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
csv in Ruby 2.6 (4)
Ruby 2.6のcsv(4)
Keep backward
compatibility
互換性を維持
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
How to keep backward compat.
互換性を維持する方法
Reconstruct test suites
テストを整理
✓
Add benchmark suites
ベンチマークを追加
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Test
テスト
Important to detect incompat.
非互換の検出のために重要
✓
Must be easy to maintain
メンテナンスしやすくしておくべき
To keep developing
継続的な開発するため
✓
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Easy to maintenance
メンテナンスしやすい状態
Easy to understand each test
各テストを理解しやすい
✓
Easy to run each test
各テストを個別に実行しやすい
Focusing a failed case is easy to debug
失敗したケースに集中できるとデバッグしやすい
✓
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Benchmark
ベンチマーク
Important to detect
performance regression bugs
性能劣化バグを検出するために重要
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
benchmark_driver gem
Fully-featured
benchmark driver
for Ruby 3x3
Ruby 3x3のために必要な機能が揃っているベンチマークツール
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
benchmark_driver gem in csv
YAML input is easy to use
YAMLによる入力が便利
✓
Can compare multiple gem versions
複数のgemのバージョンで比較可能
To detect performance regression
性能劣化を検出するため
✓
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Benchmark for each gem version
gemのバージョン毎のベンチマーク
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
csv/benchmark/
convert_nil.yaml✓
parse{,_liberal_parsing}.yaml✓
parse_{quote_char_nil,strip}.yaml✓
read.yaml, shift.yaml, write.yaml✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Benchmark as a starting point
出発点としてのベンチマーク
Join csv developing!
csvの開発に参加しよう!
✓
Adding a new benchmark is a good start
ベンチマークの追加から始めるのはどう?
We'll focus on improving performance for
benchmark cases
ベンチマークが整備されているケースの性能改善に注力するよ
✓
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
How to use improved csv?
改良されたcsvを使う方法
gem install csv
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
csv in Ruby 2.5
Ruby 2.5のcsv
Default gemified
デフォルトgem化
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Default gem
デフォルトgem
Can use it just by require
requireするだけで使える
✓
Can use it without entry in Gemfile
(But you use it bundled in your Ruby)
Gemfileに書かなくても使えるけど古い
✓
Can upgrade it by gem
gemでアップグレードできる
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
How to use improved csv?
改良されたcsvを使う方法
gem install csv
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Future
今後の話
Faster
さらに速く
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Improve String#split
String#splitを改良
Accept " "
as normal separator
" "をただの区切り文字として扱う
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
split(" ") works like awk
split(" ")はawkのように動く
" a b c".split(" ", -1)
# => ["a", "b", "c"]
" a b c".split(/ /, -1)
# => ["", "a", "", "b", "c"]
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
String#split in csv
csvでのString#split
if @column_separator == " "
line.split(/ /, -1)
else
line.split(@column_separator, -1)
end
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
split(string) vs. split(regexp)
regexp string Faster?
344448i/s 3161117i/s 9.18x
See also [Feauture:15771]
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Improve StringScanner#scan
StringScanner#scanを改良
Accept String
as pattern
Stringもパターンとして使えるようにする
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
scan(string) vs. scan(regexp)
regexp string Faster?
14712660i/s 18421631i/s 1.25x
See also ruby/strscan#4
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Faster KEN_ALL.CSV parsing (1)
より速いKEN_ALL.CSVのパース(1)
Elapsed
csv 0.791s
FastestCSV 0.141s
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Faster KEN_ALL.CSV parsing (2)
より速いKEN_ALL.CSVのパース(2)
Encoding Elapsed
csv CP932 0.791s
FastestCSV CP932 0.141s
csv UTF-8 1.345s
FastestCSV UTF-8 0.713s
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Faster KEN_ALL.CSV parsing (3)
より速いKEN_ALL.CSVのパース(3)
Encoding Elapsed
FastestCSV UTF-8 0.713s
Python UTF-8 0.208s
Apache Arrow UTF-8 0.145s
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Further work
今後の改善案
Improve transcoding performance of Ruby
Rubyのエンコーディング変換処理の高速化
✓
Improve simple case parse performance
by implementing parser in C
シンプルなケース用のパーサーをCで実装して高速化
✓
Improve perf. of REXML as well as csv
csvのようにREXMLも高速化
✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Join us!
一緒に開発しようぜ!
Red Data Tools:
https://red-data-tools.github.io/
✓
RubyData Workshop: Today 14:20-15:30✓
Code Party: Today 19:00-21:00✓
After Hack: Sun. 10:30-17:30✓
Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Join us!!
一緒に開発しようぜ!!
OSS Gate: https://oss-gate.github.io/
provides a "gate" to join OSS development
OSSの開発に参加する「入り口」を提供する取り組み
✓
Both ClearCode and Speee are one of sponsors
クリアコードもSpeeeもスポンサー
✓
✓
OSS Gate Fukuoka:
https://oss-gate-fukuoka.connpass.com/
✓

More Related Content

What's hot

Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
Grant McAlister
 
Mасштабирование микросервисов на Go, Matt Heath (Hailo)
Mасштабирование микросервисов на Go, Matt Heath (Hailo)Mасштабирование микросервисов на Go, Matt Heath (Hailo)
Mасштабирование микросервисов на Go, Matt Heath (Hailo)
Ontico
 
Boyan Ivanov - latency, the #1 metric of your cloud
Boyan Ivanov - latency, the #1 metric of your cloudBoyan Ivanov - latency, the #1 metric of your cloud
Boyan Ivanov - latency, the #1 metric of your cloud
ShapeBlue
 
[Perforce] Admin Workshop
[Perforce] Admin Workshop[Perforce] Admin Workshop
[Perforce] Admin WorkshopPerforce
 
Usenix lisa 2011
Usenix lisa 2011Usenix lisa 2011
Usenix lisa 2011
Leif Hedstrom
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
Ontico
 
(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive
Amazon Web Services
 
Introduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsIntroduction to performance tuning perl web applications
Introduction to performance tuning perl web applications
Perrin Harkins
 
Apache Traffic Server
Apache Traffic ServerApache Traffic Server
Apache Traffic Server
supertom
 
[Tel aviv merge world tour] Perforce Server Update
[Tel aviv merge world tour] Perforce Server Update[Tel aviv merge world tour] Perforce Server Update
[Tel aviv merge world tour] Perforce Server Update
Perforce
 
Integrated Cache on Netscaler
Integrated Cache on NetscalerIntegrated Cache on Netscaler
Integrated Cache on Netscaler
Mark Hillick
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
confluent
 
HTTP/2 (2017)
HTTP/2 (2017)HTTP/2 (2017)
HTTP/2 (2017)
Christian Mäder
 
How to tune Kafka® for production
How to tune Kafka® for productionHow to tune Kafka® for production
How to tune Kafka® for production
confluent
 
Backday Xebia : Retour d’expérience sur l’ingestion d’événements dans HDFS
Backday Xebia : Retour d’expérience sur l’ingestion d’événements dans HDFSBackday Xebia : Retour d’expérience sur l’ingestion d’événements dans HDFS
Backday Xebia : Retour d’expérience sur l’ingestion d’événements dans HDFS
Publicis Sapient Engineering
 
Windows Azure Service Bus
Windows Azure Service BusWindows Azure Service Bus
Windows Azure Service BusPavel Revenkov
 
Full Stack Load Testing
Full Stack Load Testing Full Stack Load Testing
Full Stack Load Testing
Terral R Jordan
 
Shootout at the AWS Corral
Shootout at the AWS CorralShootout at the AWS Corral
Shootout at the AWS Corral
PostgreSQL Experts, Inc.
 
Real-Time Stats for Candy Box
Real-Time Stats for Candy Box  Real-Time Stats for Candy Box
Real-Time Stats for Candy Box
PubNub
 

What's hot (20)

Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
 
Mасштабирование микросервисов на Go, Matt Heath (Hailo)
Mасштабирование микросервисов на Go, Matt Heath (Hailo)Mасштабирование микросервисов на Go, Matt Heath (Hailo)
Mасштабирование микросервисов на Go, Matt Heath (Hailo)
 
Boyan Ivanov - latency, the #1 metric of your cloud
Boyan Ivanov - latency, the #1 metric of your cloudBoyan Ivanov - latency, the #1 metric of your cloud
Boyan Ivanov - latency, the #1 metric of your cloud
 
[Perforce] Admin Workshop
[Perforce] Admin Workshop[Perforce] Admin Workshop
[Perforce] Admin Workshop
 
Usenix lisa 2011
Usenix lisa 2011Usenix lisa 2011
Usenix lisa 2011
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
 
(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive
 
Introduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsIntroduction to performance tuning perl web applications
Introduction to performance tuning perl web applications
 
Apache Traffic Server
Apache Traffic ServerApache Traffic Server
Apache Traffic Server
 
[Tel aviv merge world tour] Perforce Server Update
[Tel aviv merge world tour] Perforce Server Update[Tel aviv merge world tour] Perforce Server Update
[Tel aviv merge world tour] Perforce Server Update
 
Integrated Cache on Netscaler
Integrated Cache on NetscalerIntegrated Cache on Netscaler
Integrated Cache on Netscaler
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
 
HTTP/2 (2017)
HTTP/2 (2017)HTTP/2 (2017)
HTTP/2 (2017)
 
How to tune Kafka® for production
How to tune Kafka® for productionHow to tune Kafka® for production
How to tune Kafka® for production
 
Backday Xebia : Retour d’expérience sur l’ingestion d’événements dans HDFS
Backday Xebia : Retour d’expérience sur l’ingestion d’événements dans HDFSBackday Xebia : Retour d’expérience sur l’ingestion d’événements dans HDFS
Backday Xebia : Retour d’expérience sur l’ingestion d’événements dans HDFS
 
ReplacingSquidWithATS
ReplacingSquidWithATSReplacingSquidWithATS
ReplacingSquidWithATS
 
Windows Azure Service Bus
Windows Azure Service BusWindows Azure Service Bus
Windows Azure Service Bus
 
Full Stack Load Testing
Full Stack Load Testing Full Stack Load Testing
Full Stack Load Testing
 
Shootout at the AWS Corral
Shootout at the AWS CorralShootout at the AWS Corral
Shootout at the AWS Corral
 
Real-Time Stats for Candy Box
Real-Time Stats for Candy Box  Real-Time Stats for Candy Box
Real-Time Stats for Candy Box
 

Similar to Better CSV processing with Ruby 2.6

20141210 rakuten techtalk
20141210 rakuten techtalk20141210 rakuten techtalk
20141210 rakuten techtalk
Hiroshi SHIBATA
 
[Srijan Wednesday Webinar] Rails 5: What's in It for Me?
[Srijan Wednesday Webinar] Rails 5: What's in It for Me?[Srijan Wednesday Webinar] Rails 5: What's in It for Me?
[Srijan Wednesday Webinar] Rails 5: What's in It for Me?
Srijan Technologies
 
RubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
RubyKaigi 2022 - Fast data processing with Ruby and Apache ArrowRubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
RubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
Kouhei Sutou
 
Fighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with Embulk
Sadayuki Furuhashi
 
20140419 oedo rubykaigi04
20140419 oedo rubykaigi0420140419 oedo rubykaigi04
20140419 oedo rubykaigi04Hiroshi SHIBATA
 
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine YardHow I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
SV Ruby on Rails Meetup
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
Joe Stein
 
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
5 Takeaways from Migrating a Library to Scala 3 - Scala Love5 Takeaways from Migrating a Library to Scala 3 - Scala Love
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
Natan Silnitsky
 
Scripting Embulk Plugins
Scripting Embulk PluginsScripting Embulk Plugins
Scripting Embulk Plugins
Sadayuki Furuhashi
 
Introduction to chef framework
Introduction to chef frameworkIntroduction to chef framework
Introduction to chef framework
morgoth
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
Michael Renner
 
Jenkins multi configuration (matrix)
Jenkins multi configuration (matrix)Jenkins multi configuration (matrix)
Jenkins multi configuration (matrix)
Muhammad Zbeedat
 
Get Going With RVM and Rails 3
Get Going With RVM and Rails 3Get Going With RVM and Rails 3
Get Going With RVM and Rails 3
Karmen Blake
 
fog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloudfog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloud
Wesley Beary
 
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
Wesley Beary
 
20140425 ruby conftaiwan2014
20140425 ruby conftaiwan201420140425 ruby conftaiwan2014
20140425 ruby conftaiwan2014Hiroshi SHIBATA
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value StoreSantal Li
 
Distribute key value_store
Distribute key value_storeDistribute key value_store
Distribute key value_storedrewz lin
 
Introduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerIntroduction to LAVA Workload Scheduler
Introduction to LAVA Workload Scheduler
Nopparat Nopkuat
 
MNPHP Scalable Architecture 101 - Feb 3 2011
MNPHP Scalable Architecture 101 - Feb 3 2011MNPHP Scalable Architecture 101 - Feb 3 2011
MNPHP Scalable Architecture 101 - Feb 3 2011
Mike Willbanks
 

Similar to Better CSV processing with Ruby 2.6 (20)

20141210 rakuten techtalk
20141210 rakuten techtalk20141210 rakuten techtalk
20141210 rakuten techtalk
 
[Srijan Wednesday Webinar] Rails 5: What's in It for Me?
[Srijan Wednesday Webinar] Rails 5: What's in It for Me?[Srijan Wednesday Webinar] Rails 5: What's in It for Me?
[Srijan Wednesday Webinar] Rails 5: What's in It for Me?
 
RubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
RubyKaigi 2022 - Fast data processing with Ruby and Apache ArrowRubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
RubyKaigi 2022 - Fast data processing with Ruby and Apache Arrow
 
Fighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with Embulk
 
20140419 oedo rubykaigi04
20140419 oedo rubykaigi0420140419 oedo rubykaigi04
20140419 oedo rubykaigi04
 
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine YardHow I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
5 Takeaways from Migrating a Library to Scala 3 - Scala Love5 Takeaways from Migrating a Library to Scala 3 - Scala Love
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
 
Scripting Embulk Plugins
Scripting Embulk PluginsScripting Embulk Plugins
Scripting Embulk Plugins
 
Introduction to chef framework
Introduction to chef frameworkIntroduction to chef framework
Introduction to chef framework
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
Jenkins multi configuration (matrix)
Jenkins multi configuration (matrix)Jenkins multi configuration (matrix)
Jenkins multi configuration (matrix)
 
Get Going With RVM and Rails 3
Get Going With RVM and Rails 3Get Going With RVM and Rails 3
Get Going With RVM and Rails 3
 
fog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloudfog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloud
 
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
 
20140425 ruby conftaiwan2014
20140425 ruby conftaiwan201420140425 ruby conftaiwan2014
20140425 ruby conftaiwan2014
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
 
Distribute key value_store
Distribute key value_storeDistribute key value_store
Distribute key value_store
 
Introduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerIntroduction to LAVA Workload Scheduler
Introduction to LAVA Workload Scheduler
 
MNPHP Scalable Architecture 101 - Feb 3 2011
MNPHP Scalable Architecture 101 - Feb 3 2011MNPHP Scalable Architecture 101 - Feb 3 2011
MNPHP Scalable Architecture 101 - Feb 3 2011
 

More from Kouhei Sutou

Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Kouhei Sutou
 
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache ArrowRubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
Kouhei Sutou
 
Rubyと仕事と自由なソフトウェア
Rubyと仕事と自由なソフトウェアRubyと仕事と自由なソフトウェア
Rubyと仕事と自由なソフトウェア
Kouhei Sutou
 
Apache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのかApache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのか
Kouhei Sutou
 
Apache Arrow 1.0 - A cross-language development platform for in-memory data
Apache Arrow 1.0 - A cross-language development platform for in-memory dataApache Arrow 1.0 - A cross-language development platform for in-memory data
Apache Arrow 1.0 - A cross-language development platform for in-memory data
Kouhei Sutou
 
Apache Arrow 2019
Apache Arrow 2019Apache Arrow 2019
Apache Arrow 2019
Kouhei Sutou
 
Redmine検索の未来像
Redmine検索の未来像Redmine検索の未来像
Redmine検索の未来像
Kouhei Sutou
 
Apache Arrow - A cross-language development platform for in-memory data
Apache Arrow - A cross-language development platform for in-memory dataApache Arrow - A cross-language development platform for in-memory data
Apache Arrow - A cross-language development platform for in-memory data
Kouhei Sutou
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
Kouhei Sutou
 
Apache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームApache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォーム
Kouhei Sutou
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
Kouhei Sutou
 
MySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システムMySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システム
Kouhei Sutou
 
MySQL 8.0でMroonga
MySQL 8.0でMroongaMySQL 8.0でMroonga
MySQL 8.0でMroonga
Kouhei Sutou
 
My way with Ruby
My way with RubyMy way with Ruby
My way with Ruby
Kouhei Sutou
 
Red Data Tools
Red Data ToolsRed Data Tools
Red Data Tools
Kouhei Sutou
 
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Kouhei Sutou
 
MariaDBとMroongaで作る全言語対応超高速全文検索システム
MariaDBとMroongaで作る全言語対応超高速全文検索システムMariaDBとMroongaで作る全言語対応超高速全文検索システム
MariaDBとMroongaで作る全言語対応超高速全文検索システム
Kouhei Sutou
 
PGroonga 2 – Make PostgreSQL rich full text search system backend!
PGroonga 2 – Make PostgreSQL rich full text search system backend!PGroonga 2 – Make PostgreSQL rich full text search system backend!
PGroonga 2 – Make PostgreSQL rich full text search system backend!
Kouhei Sutou
 
PGroonga 2 - PostgreSQLでの全文検索の決定版
PGroonga 2 - PostgreSQLでの全文検索の決定版PGroonga 2 - PostgreSQLでの全文検索の決定版
PGroonga 2 - PostgreSQLでの全文検索の決定版
Kouhei Sutou
 
PostgreSQLとPGroongaで作るPHPマニュアル高速全文検索システム
PostgreSQLとPGroongaで作るPHPマニュアル高速全文検索システムPostgreSQLとPGroongaで作るPHPマニュアル高速全文検索システム
PostgreSQLとPGroongaで作るPHPマニュアル高速全文検索システム
Kouhei Sutou
 

More from Kouhei Sutou (20)

Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
Apache Arrow Flight – ビッグデータ用高速データ転送フレームワーク #dbts2021
 
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache ArrowRubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
 
Rubyと仕事と自由なソフトウェア
Rubyと仕事と自由なソフトウェアRubyと仕事と自由なソフトウェア
Rubyと仕事と自由なソフトウェア
 
Apache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのかApache Arrowフォーマットはなぜ速いのか
Apache Arrowフォーマットはなぜ速いのか
 
Apache Arrow 1.0 - A cross-language development platform for in-memory data
Apache Arrow 1.0 - A cross-language development platform for in-memory dataApache Arrow 1.0 - A cross-language development platform for in-memory data
Apache Arrow 1.0 - A cross-language development platform for in-memory data
 
Apache Arrow 2019
Apache Arrow 2019Apache Arrow 2019
Apache Arrow 2019
 
Redmine検索の未来像
Redmine検索の未来像Redmine検索の未来像
Redmine検索の未来像
 
Apache Arrow - A cross-language development platform for in-memory data
Apache Arrow - A cross-language development platform for in-memory dataApache Arrow - A cross-language development platform for in-memory data
Apache Arrow - A cross-language development platform for in-memory data
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
Apache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームApache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォーム
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
MySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システムMySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システム
 
MySQL 8.0でMroonga
MySQL 8.0でMroongaMySQL 8.0でMroonga
MySQL 8.0でMroonga
 
My way with Ruby
My way with RubyMy way with Ruby
My way with Ruby
 
Red Data Tools
Red Data ToolsRed Data Tools
Red Data Tools
 
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
Mroongaの高速全文検索機能でWordPress内のコンテンツを有効活用!
 
MariaDBとMroongaで作る全言語対応超高速全文検索システム
MariaDBとMroongaで作る全言語対応超高速全文検索システムMariaDBとMroongaで作る全言語対応超高速全文検索システム
MariaDBとMroongaで作る全言語対応超高速全文検索システム
 
PGroonga 2 – Make PostgreSQL rich full text search system backend!
PGroonga 2 – Make PostgreSQL rich full text search system backend!PGroonga 2 – Make PostgreSQL rich full text search system backend!
PGroonga 2 – Make PostgreSQL rich full text search system backend!
 
PGroonga 2 - PostgreSQLでの全文検索の決定版
PGroonga 2 - PostgreSQLでの全文検索の決定版PGroonga 2 - PostgreSQLでの全文検索の決定版
PGroonga 2 - PostgreSQLでの全文検索の決定版
 
PostgreSQLとPGroongaで作るPHPマニュアル高速全文検索システム
PostgreSQLとPGroongaで作るPHPマニュアル高速全文検索システムPostgreSQLとPGroongaで作るPHPマニュアル高速全文検索システム
PostgreSQLとPGroongaで作るPHPマニュアル高速全文検索システム
 

Recently uploaded

Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 

Recently uploaded (20)

Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 

Better CSV processing with Ruby 2.6

  • 1. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Better CSV processing with Ruby 2.6 Kouhei Sutou/Kazuma Furuhashi ClearCode Inc./Speee, Inc. RubyKaigi 2019 2019-04-19
  • 2. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Ad: Silver sponsor
  • 3. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Ad: Cafe sponsor
  • 4. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Kouhei Sutou The president of ClearCode Inc. クリアコードの社長 ✓ A new maintainer of the csv library csvライブラリーの新メンテナー ✓ The founder of Red Data Tools project Red Data Toolsプロジェクトの立ち上げ人 Provides data processing tools for Ruby Ruby用のデータ処理ツールを提供するプロジェクト ✓ ✓
  • 5. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Kazuma Furuhashi A member of Asakusa.rb / Red Data Tools Asakusa.rb/Red Data Toolsメンバー ✓ Worikng at Speee Inc. Speeeで働いている ✓
  • 6. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 csv in Ruby 2.6 (1) Ruby 2.6のcsv(1) Faster CSV parsing CSVパースの高速化
  • 7. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Unquoted CSV クォートなしのCSV AAAAA,AAAAA,AAAAA ... 2.5 2.6 Faster? 432.0i/s 764.9i/s 1.77x
  • 8. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Quoted CSV クォートありのCSV "AAAAA","AAAAA","AAAAA" ... 2.5 2.6 Faster? 274.1i/s 534.5i/s 1.95x
  • 9. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Quoted separator CSV (1) 区切り文字をクォートしているCSV(1) ",AAAAA",",AAAAA",",AAAAA" ... 2.5 2.6 Faster? 211.0i/s 330.0/s 1.56x
  • 10. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Quoted separator CSV (2) 区切り文字をクォートしているCSV(2) "AAAAArn","AAAAArn","AAAAArn" ... 2.5 2.6 Faster? 118.7i/s 325.6/s 2.74x
  • 11. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Quoted CSVs クォートありのCSV 2.5 2.6 Just quoted 274.1i/s 554.5i/s Include sep1 211.0i/s 330.0i/s Include sep2 118.0i/s 325.6i/s (Note) (Slow down) (Still fast) Note: "Just quoted" on 2.6 is optimized
  • 12. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Multibyte CSV マルチバイトのCSV あああああ,あああああ,あああああ ... 2.5 2.6 Faster? 371.2i/s 626.6i/s 1.69x
  • 13. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 csv in Ruby 2.6 (2) Ruby 2.6のcsv(2) Faster CSV writing CSV書き出しの高速化
  • 14. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 CSV.generate_line n_rows.times do CSV.generate_line(fields) end 2.5 2.6 Faster? 284.4i/s 684.2i/s 2.41x
  • 15. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 CSV#<< output = StringIO.new csv = CSV.new(output) n_rows.times {csv << fields} 2.5 2.6 Faster? 2891.4i/s 4824.1i/s 1.67x
  • 16. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 CSV.generate_line vs. CSV#<< 2.5 2.6 generate_ line 284.4i/s 684.2i/s << 2891.4i/s 4824.1i/s Use << for multiple writes 複数行書き出すときは<<を使うこと
  • 17. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 csv in Ruby 2.6 (3) Ruby 2.6のcsv(3) New CSV parser for further improvements さらなる改良のための新しいCSVパーサー
  • 18. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Benchmark with KEN_ALL.CSV KEN_ALL.CSVでのベンチマーク 01101,"060 ","0600000","ホッカイドウ","サッポロシチュウオウク",... ...(124257 lines)... 47382,"90718","9071801","オキナワケン","ヤエヤマグンヨナグニチョウ",... Zip code data in Japan 日本の郵便番号データ https://www.post.japanpost.jp/zipcode/download.html
  • 19. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 KEN_ALL.CSV statistics KEN_ALL.CSVの統計情報 Size(サイズ) 11.7MiB # of columns(列数) 15 # of rows(行数) 124259 Encoding(エンコーディング) CP932
  • 20. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Parsing KEN_ALL.CSV KEN_ALL.CSVのパース CSV.foreach("KEN_ALL.CSV", "r:cp932") do |row| end 2.5 2.6 Faster? 1.17s 0.79s 1.48x
  • 21. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Fastest parsing in pure Ruby Ruby実装での最速のパース方法 input.each_line(chomp: true) do |line| line.split(",", -1) do |column| end end Limitation: No quote 制限:クォートがないこと
  • 22. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 KEN_ALL.CSV without quote クォートなしのKEN_ALL.CSV 01101,060 ,0600000,ホッカイドウ,サッポロシチュウオウク,... ...(124257 lines)... 47382,90718,9071801,オキナワケン,ヤエヤマグンヨナグニチョウ,...
  • 23. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Optimized no quote CSV parsing 最適化したクォートなしCSVのパース方法 CSV.foreach("KEN_ALL_NO_QUOTE.CSV", "r:cp932", quote_char: nil) {|row|} split 2.6 Faster? 0.32s 0.37s 0.86x (almost the same/同等)
  • 24. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Summary: Performance まとめ:性能 Parsing: 1.5x-3x faster パース:1.5x-3x高速 Max to the "split" level by using an option オプションを指定すると最大で「split」レベルまで高速化可能 ✓ ✓ Writing: 1.5x-2.5x faster 書き出し:1.5x-2.5x高速 Use CSV#<< than CSV.generate_line CSV.generate_lineよりもCSV#<<を使うこと ✓ ✓
  • 25. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 How to improve performance (1) 速度改善方法(1) Complex quote 複雑なクォート
  • 26. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Complex quote 複雑なクォート "AA""AAA" "AA,AAA" "AArAAA" "AAnAAA"
  • 27. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Use StringScanner StringScannerを使う String#split is very fast String#splitは高速 ✓ String#split is naive for complex quote String#splitは複雑なクォートを処理するには単純過ぎる ✓
  • 28. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 2.5 uses String#split in_extended_column = false # "...n..." case @input.each_line do |line| line.split(",", -1).each do |part| if in_extended_column # ... elsif part.start_with?('"') if part.end_with?('"') row << pars.gsub('""', '"') # "...""..." case else in_extended_column = true end # ...
  • 29. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 split: Complex quote Just quoted 274.1i/s Include sep1 211.0i/s Include sep2 118.0i/s Slow down 遅くなる
  • 30. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 2.6 uses StringScanner row = [] until @scanner.eos? value = parse_column_value if @scanner.scan(/,/) row << value elsif @scanner.scan(/n/) row << value yield(row) row = [] end end
  • 31. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 parse_column_value def parse_column_value parse_unquoted_column_value || parse_quoted_column_value end Compositable components 部品を組み合わせられる
  • 32. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 parse_unquoted_column_value def parse_unquoted_column_value @scanner.scan(/[^,"rn]+/) end
  • 33. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 parse_quoted_column_value def parse_quoted_column_value # Not quoted return nil unless @scanner.scan(/"/) # Quoted case ... end
  • 34. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Parse methods can be composited パースメソッドを組み合わせられる def parse_column_value parse_unquoted_column_value || parse_quoted_column_value end Easy to maintain メンテナンスしやすい
  • 35. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Point (1) ポイント(1) Use StringScanner for complex case 複雑なケースにはStringScannerを使う ✓ StringScanner for complex case: 複雑なケースにStringScannerを使うと: Easy to maintain メンテナンスしやすい ✓ No performance regression 性能が劣化しない ✓ ✓
  • 36. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 StringScanner: Complex quote Just quoted 554.5i/s Include sep1 330.0i/s Include sep2 325.6i/s No slow down...? 遅くなっていない。。。?
  • 37. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 How to improve performance (2) 速度改善方法(2) Simple case 単純なケース
  • 38. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Simple case 単純なケース AAAAA "AAAAA"
  • 39. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Use String#split String#splitを使う StringScanner is slow for simple case StringScannerは単純なケースでは遅い
  • 40. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Fallback to StringScanner impl. StringScanner実装にフォールバック def parse_by_strip(&block) @input.each_line do |line| if complex?(line) return parse_by_string_scanner(&block) else yield(line.split(",")) end end end
  • 41. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Quoted CSVs クォートありのCSV StringScanner split + StringScanner Just quoted 311.7i/s 523.4i/s Include sep1 312.9i/s 309.8i/s Include sep2 311.3i/s 312.6i/s
  • 42. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Point (2) ポイント(2) First try optimized version まず最適化バージョンを試す ✓ Fallback to robust version when complexity is detected 複雑だとわかったらちゃんとしたバージョンにフォールバック ✓
  • 43. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 How to improve performance (3) 速度改善方法(3) loop do ↓ while true
  • 44. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 loop vs. while How Throughput loop 377i/s while 401i/s
  • 45. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Point (3) ポイント(3) while doesn't create a new scope whileは新しいスコープを作らない ✓ Normally, you can use loop ふつうはloopでよい Normally, loop isn't a bottle neck ふつうはloopがボトルネックにはならない ✓ ✓
  • 46. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 How to improve performance (4) 速度改善方法(4) Lazy 遅延
  • 47. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 CSV object is parser and writer CSVオブジェクトは読み書きできる 2.5: Always initializes everything 2.5:常にすべてを初期化 ✓ 2.6: Initializes when it's needed 2.6:必要になったら初期化 ✓
  • 48. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Write performance 2.5 2.6 Faster? generate_ line 284.4i/s 684.2i/s 2.41x << 2891.4i/s 4824.1i/s 1.67x
  • 49. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 How to initialize lazily 初期化を遅延する方法 def parser @parser ||= Parser.new(...) end def writer @writer ||= Writer.new(...) end
  • 50. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Point (4) ポイント(4) Do only needed things 必要なことだけする ✓ One class for one feature 機能ごとにクラスを分ける ✓
  • 51. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 New features by new parser 新しいパーサーによる新機能 Add support for " escape "でのエスケープをサポート ✓ Add strip: option strip:オプションを追加 ✓
  • 52. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 " escape "でのエスケープ CSV.parse(%Q["a""bc","a"bc"], liberal_parsing: {backslash_quote: true}) # [["a"bc", "a"bc"]]
  • 53. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 strip: strip: CSV.parse(%Q[ abc , " abc"], strip: true) # [["abc", " abc"]] CSV.parse(%Q[abca,abc], strip: "a") # [["bc", "bc"]]
  • 54. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 csv in Ruby 2.6 (4) Ruby 2.6のcsv(4) Keep backward compatibility 互換性を維持
  • 55. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 How to keep backward compat. 互換性を維持する方法 Reconstruct test suites テストを整理 ✓ Add benchmark suites ベンチマークを追加 ✓
  • 56. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Test テスト Important to detect incompat. 非互換の検出のために重要 ✓ Must be easy to maintain メンテナンスしやすくしておくべき To keep developing 継続的な開発するため ✓ ✓
  • 57. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Easy to maintenance メンテナンスしやすい状態 Easy to understand each test 各テストを理解しやすい ✓ Easy to run each test 各テストを個別に実行しやすい Focusing a failed case is easy to debug 失敗したケースに集中できるとデバッグしやすい ✓ ✓
  • 58. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Benchmark ベンチマーク Important to detect performance regression bugs 性能劣化バグを検出するために重要 ✓
  • 59. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 benchmark_driver gem Fully-featured benchmark driver for Ruby 3x3 Ruby 3x3のために必要な機能が揃っているベンチマークツール
  • 60. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 benchmark_driver gem in csv YAML input is easy to use YAMLによる入力が便利 ✓ Can compare multiple gem versions 複数のgemのバージョンで比較可能 To detect performance regression 性能劣化を検出するため ✓ ✓
  • 61. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Benchmark for each gem version gemのバージョン毎のベンチマーク
  • 62. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 csv/benchmark/ convert_nil.yaml✓ parse{,_liberal_parsing}.yaml✓ parse_{quote_char_nil,strip}.yaml✓ read.yaml, shift.yaml, write.yaml✓
  • 63. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Benchmark as a starting point 出発点としてのベンチマーク Join csv developing! csvの開発に参加しよう! ✓ Adding a new benchmark is a good start ベンチマークの追加から始めるのはどう? We'll focus on improving performance for benchmark cases ベンチマークが整備されているケースの性能改善に注力するよ ✓ ✓
  • 64. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 How to use improved csv? 改良されたcsvを使う方法 gem install csv
  • 65. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 csv in Ruby 2.5 Ruby 2.5のcsv Default gemified デフォルトgem化
  • 66. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Default gem デフォルトgem Can use it just by require requireするだけで使える ✓ Can use it without entry in Gemfile (But you use it bundled in your Ruby) Gemfileに書かなくても使えるけど古い ✓ Can upgrade it by gem gemでアップグレードできる ✓
  • 67. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 How to use improved csv? 改良されたcsvを使う方法 gem install csv
  • 68. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Future 今後の話 Faster さらに速く
  • 69. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Improve String#split String#splitを改良 Accept " " as normal separator " "をただの区切り文字として扱う
  • 70. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 split(" ") works like awk split(" ")はawkのように動く " a b c".split(" ", -1) # => ["a", "b", "c"] " a b c".split(/ /, -1) # => ["", "a", "", "b", "c"]
  • 71. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 String#split in csv csvでのString#split if @column_separator == " " line.split(/ /, -1) else line.split(@column_separator, -1) end
  • 72. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 split(string) vs. split(regexp) regexp string Faster? 344448i/s 3161117i/s 9.18x See also [Feauture:15771]
  • 73. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Improve StringScanner#scan StringScanner#scanを改良 Accept String as pattern Stringもパターンとして使えるようにする
  • 74. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 scan(string) vs. scan(regexp) regexp string Faster? 14712660i/s 18421631i/s 1.25x See also ruby/strscan#4
  • 75. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Faster KEN_ALL.CSV parsing (1) より速いKEN_ALL.CSVのパース(1) Elapsed csv 0.791s FastestCSV 0.141s
  • 76. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Faster KEN_ALL.CSV parsing (2) より速いKEN_ALL.CSVのパース(2) Encoding Elapsed csv CP932 0.791s FastestCSV CP932 0.141s csv UTF-8 1.345s FastestCSV UTF-8 0.713s
  • 77. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Faster KEN_ALL.CSV parsing (3) より速いKEN_ALL.CSVのパース(3) Encoding Elapsed FastestCSV UTF-8 0.713s Python UTF-8 0.208s Apache Arrow UTF-8 0.145s
  • 78. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Further work 今後の改善案 Improve transcoding performance of Ruby Rubyのエンコーディング変換処理の高速化 ✓ Improve simple case parse performance by implementing parser in C シンプルなケース用のパーサーをCで実装して高速化 ✓ Improve perf. of REXML as well as csv csvのようにREXMLも高速化 ✓
  • 79. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Join us! 一緒に開発しようぜ! Red Data Tools: https://red-data-tools.github.io/ ✓ RubyData Workshop: Today 14:20-15:30✓ Code Party: Today 19:00-21:00✓ After Hack: Sun. 10:30-17:30✓
  • 80. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0 Join us!! 一緒に開発しようぜ!! OSS Gate: https://oss-gate.github.io/ provides a "gate" to join OSS development OSSの開発に参加する「入り口」を提供する取り組み ✓ Both ClearCode and Speee are one of sponsors クリアコードもSpeeeもスポンサー ✓ ✓ OSS Gate Fukuoka: https://oss-gate-fukuoka.connpass.com/ ✓