SlideShare a Scribd company logo
• A NES Emulator written in Ruby
Demo
2
• To drive “Ruby3x3”
– Matz said “Ruby 3 will be 3 times faster than Ruby 2.0”
– Optcarrot is a CPU-intensive, real-life benchmark
• Currently works at 20 fps in Ruby 2.0  60 fps in 3.0!
• A carrot to let horses (Ruby committers) optimize Ruby
• To challenge Ruby’s limit
– NES video resolution: 256 x 240 pixels / 60 fps
– We need to do all other tasks in 0.8 sec.? Impossible?
(256*240*60).times do |i|
ary[0] = 0
end
0.2 sec.
3
• Famicom programming with Ruby
(takkaw, 2007)
– Presentation NES ROM by Ruby
• MRI's incremental GC
(authornari, 2008)
– Mario-like game "Nario" is used
to demonstrate the real-time GC
• Burn (remore, 2014)
– A framework to create NES ROM
in Ruby
4
• NES architecture in three minutes
• How I achieved 20 fps
• Ruby interpreters’ benchmark
• Towards 60 fps
• Speaker's award & Conclusion
5
• The details of NES architecture
– In short: “See http://wiki.nesdev.com/ !”
• How to find the bottleneck
– In short: “Use stackprof!”
6
川崎Ruby会議01
(2016/08/20)
• I’ll talk these topics at
“Kawasaki Ruby Kaigi 01”
•  NES Architecture in three minutes 
• How I achieved 20 fps
• Ruby interpreters’ benchmark
• Towards 60 fps
• Speaker's award & Conclusion
7
CPU GPU
Program ROM Bitmap ROM
Cartridge
NES
RAM
(2 kB)
VRAM
(2 kB)
control
read
read/write
read
render
read/write
To be precise: GPU is called as “PPU” (Picture Processing Unit) in NES
interrupt
8
GPU
80%
CPU
10%
others
10%
Execution time
ratio
• Why does GPU emulation
take so much?
– GPU runs at higher
clock speed than CPU
• GPU: 5.3 MHz
• CPU: 1.8 MHz
– GPU does many
complex tasks
• Background rendering
• Sprite rendering
• Scrolling
• Conflict detection
• Interrupts
9
• Per-pixel tasks (i.e. 256 x 240 x 60 = 3.7M times per second)
1. Identify what bitmap is shown here
2. Read attribute data (color, flip flag, z-index)
3. Read bitmap data from the ROM
4. Assemble them into video signal
Background map
Attribute map
VRAM
GPU2
1
3
4
Target
pixel
To be precise: These tasks are actually done per eight pixels
10
Bitmap ROM
Cartridge
• Terribly complex
http://wiki.nesdev.com/w/index.php/File:Ntsc_timing.png
11
• NES Architecture in three minutes
•  How I achieved 20 fps 
– How to emulate CPU-GPU parallelism
– How to optimize GPU emulation
• Ruby interpreters’ benchmark
• Towards 60 fps
• Speaker's award & Conclusion
12
• Naïve approach: emulate CPU & GPU per clock
1. Run the CPU for one clock
2. Run the GPU for three clocks
3. Repeat 1 and 2
– Simple and accurate
– Very slow (~ 3 fps) because of too many method calls
CPU step
step
step
step
step
step
step
step
step
step
step
step
step
step
step
step
clock
GPU
13
• “Catch-up” method: emulate CPU&GPU per control
1. Run the CPU until it tries to control the GPU
2. Run the GPU until it catch up with the CPU
3. Repeat 1 and 2
– Accurate and fast (~ 10 fps)
CPU run
catchup
run
catchup
run
clock
GPU CPU attempts to
control GPU
14
• Naïve approach: per-pixel emulation
– Just as like the actual hardware
Bitmap ROM
Background map
Attribute map
VRAM
GPU2
1
3
4
This calculation is done for each iteration  Slow!
15
Cartridge
• Pre-render the screen and update it on demand
Background map
Attribute map
VRAM
GPU
screen buffer
When VRAM is
modified by CPU,
Only invalidated pixels
is updated
Transported to TV
per frame
This explanation is over exaggerated!
Actually, the GPU emulation loop is not removed completely.
16
Bitmap ROM
Cartridge

• Intel® Core™ i7-4500U @ 2.40 GHz
• Ubuntu 16.04
17
• NES Architecture in three minutes
• How I achieved 20 fps
•  Ruby interpreters’ benchmark 
• Towards 60 fps
• Speaker's award & Conclusion
18
• Is not so big: <5000 lines of code
– cf. redmine: >30000 LOC
• Requires no library (in no-GUI mode)
– It works on miniruby
– ruby-ffi is used for GUI (SDL2)
• Uses only basic Ruby features
– It works on ruby 1.8 / mruby / topaz / opal
(with shim and/or systematic modification of source code)
19
28.7
28.1
25.5
26.6
25.0
21.4
5.83
21.9
39.2
25.0
4.10
7.48
27.0
0.0287
0.0 10.0 20.0 30.0 40.0
trunk
ruby23
ruby22
ruby21
ruby20
ruby193
ruby187
omrpreview
jruby9k
jruby17
rubinius
mruby
topaz
opal
20
MRI has been improved
(1.81.92.02.3)
OMR preview isn’t fast?
(MRI 2.2 w/ JIT)
JRuby9k is the fastest
ruby 2.0 achives >20 fps
(important for Ruby3x3)
Optcarrot works on
subset Ruby impls.
• JRuby 9k is the fastest:
“Deoptimization” looks a promising approach
– At first, an optimized byte-code is generated with
ignoring rare/pathological cases
– When needed, it is discarded and a naïve byte-code is
regenerated
– BTW: JRuby‘s boot time is too bad
• OMR is not so fast?
– JIT has no advantage?
• Method calls and built-in methods may be still bottleneck
– OMR seems not to support opt_case_dispatch yet
• i.e., a case statement is not optimized well?
21
• NES Architecture in three minutes
• How I achieved 20 fps
• Ruby interpreters’ benchmark
•  Towards 60 fps 
• Speaker's award & Conclusion
22
™
• We have kept the code reasonably clean so far
• Now, we use any means to achieve the speed
• CAUTION: Casual Ruby programmers MUST NOT
use the following ProTips™
– This is an experiment to study how to improve Ruby
implementation
23
™
• Method call is slow
– Replace it with its method definition
while catchup?
inc_addr
end
while catchup?
@addr += 1
end
28 fps  40 fps
24
™
• Instance variable access is slow
– Replace it with local variable
– Note: the variable must not be used out of this method
while catchup?
@addr += 1
end
begin
addr = @addr
while catchup?
addr += 1
end
ensure
@addr = addr
end
40 fps  47 fps
25
• Batch multiple frequent
actions across some clocks
™ while catchup?
if can_be_fast?
# fast-path
do_A
do_B
do_C
@clock += 3
else
case @clock
when 1 then do_A
when 2 then do_B
when 3 then do_C
...
end
@clock += 1
end
end
while catchup?
case @clock
when 1 then do_A
when 2 then do_B
when 3 then do_C
...
end
@clock += 1
end
47 fps  63 fps 26
™
29.4
40.3
46.6
62.7
68.8
83.2
0.0 20.0 40.0 60.0 80.0
base
method inlining
ivar localization
fastpath
misc
CPU misc
ProTip™ 1
ProTip™ 2
ProTip™ 3
27
• Used Regexp to systematically rewrite the code
– instead of hand-rewriting
• Used Welch’s t-test to confirm each optimization
src = File.read(__FILE__)
src.gsub!(/.../) { ... } # method inlining
src.gsub!(/.../) { ... } # ivar localization
eval(src)
28
29
28.6
28.0
25.2
26.9
26.1
21.4
5.87
22.8
39.3
25.3
3.97
7.02
29.3
0.0285
84.0
82.9
78.2
79.6
68.1
64.0
1.46
69.0
2.12
6.13
2.43
0.754
0.0501
0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0
trunk
ruby23
ruby22
ruby21
ruby20
ruby193
ruby187
omrpreview
jruby9k
jruby17
rubinius
mruby
topaz
opal
default mode optimized mode
The generated program is
too large to fit
JVM 64k bytecode limit
30
• NES Architecture in three minutes
• How I achieved 20 fps
• Ruby interpreters’ benchmark
• Towards 60 fps
•  Speaker's award & Conclusion 
31
• The first person who
improved MRI performance
by using Optcarrot
– Instance variable access has
been improved about 10%
[Bug #12274]
• Optcarrot has already
started to improve Ruby!
32
• Optcarrot, a pure-Ruby NES emulator
– Non-trivial benchmark for Ruby implementations
• Wide-range Ruby implementation benchmark
– AFAIK, this is the first real-life benchmark to compare
MRI / Jruby / Rubinius / mruby / topaz / opal
• ProTips™ for boosting a Ruby program
– Need to improve method calls and instance variables
instead of JIT?
• More details? 
33
川崎Ruby会議01
(2016/08/20)
34
¥2,680 + tax ¥5,440 + tax

More Related Content

What's hot

“bcache”を使ってSSDの速さと HDDの大容量のいいとこどり 2015-12-12
“bcache”を使ってSSDの速さと HDDの大容量のいいとこどり 2015-12-12“bcache”を使ってSSDの速さと HDDの大容量のいいとこどり 2015-12-12
“bcache”を使ってSSDの速さと HDDの大容量のいいとこどり 2015-12-12
Nobuto Murata
 
C/C++プログラマのための開発ツール
C/C++プログラマのための開発ツールC/C++プログラマのための開発ツール
C/C++プログラマのための開発ツール
MITSUNARI Shigeo
 
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
NTT DATA Technology & Innovation
 
メタプログラミングって何だろう
メタプログラミングって何だろうメタプログラミングって何だろう
メタプログラミングって何だろうKota Mizushima
 
Ruby でつくる型付き Ruby
Ruby でつくる型付き RubyRuby でつくる型付き Ruby
Ruby でつくる型付き Ruby
mametter
 
エラーハンドリング
エラーハンドリングエラーハンドリング
エラーハンドリング
道化師 堂華
 
Xeon PhiとN体計算コーディング x86/x64最適化勉強会6(@k_nitadoriさんの代理アップ)
Xeon PhiとN体計算コーディング x86/x64最適化勉強会6(@k_nitadoriさんの代理アップ)Xeon PhiとN体計算コーディング x86/x64最適化勉強会6(@k_nitadoriさんの代理アップ)
Xeon PhiとN体計算コーディング x86/x64最適化勉強会6(@k_nitadoriさんの代理アップ)MITSUNARI Shigeo
 
TRICK 2022 Results
TRICK 2022 ResultsTRICK 2022 Results
TRICK 2022 Results
mametter
 
なぜなにリアルタイムレンダリング
なぜなにリアルタイムレンダリングなぜなにリアルタイムレンダリング
なぜなにリアルタイムレンダリング
Satoshi Kodaira
 
ドキュメントを作りたくなってしまう魔法のツールSphinx
ドキュメントを作りたくなってしまう魔法のツールSphinxドキュメントを作りたくなってしまう魔法のツールSphinx
ドキュメントを作りたくなってしまう魔法のツールSphinx
Takayuki Shimizukawa
 
ネットワーク ゲームにおけるTCPとUDPの使い分け
ネットワーク ゲームにおけるTCPとUDPの使い分けネットワーク ゲームにおけるTCPとUDPの使い分け
ネットワーク ゲームにおけるTCPとUDPの使い分け
モノビット エンジン
 
WebP入門
WebP入門WebP入門
実践 WebRTC 〜最新事例と開発ノウハウの紹介〜
実践 WebRTC 〜最新事例と開発ノウハウの紹介〜実践 WebRTC 〜最新事例と開発ノウハウの紹介〜
実践 WebRTC 〜最新事例と開発ノウハウの紹介〜
Yusuke Naka
 
Rust Programming Language
Rust Programming LanguageRust Programming Language
Rust Programming Language
Jaeju Kim
 
shared_ptrとゲームプログラミングでのメモリ管理
shared_ptrとゲームプログラミングでのメモリ管理shared_ptrとゲームプログラミングでのメモリ管理
shared_ptrとゲームプログラミングでのメモリ管理
DADA246
 
Rustで楽しむ競技プログラミング
Rustで楽しむ競技プログラミングRustで楽しむ競技プログラミング
Rustで楽しむ競技プログラミング
yoshrc
 
TLS, HTTP/2演習
TLS, HTTP/2演習TLS, HTTP/2演習
TLS, HTTP/2演習
shigeki_ohtsu
 
闇魔術を触ってみた
闇魔術を触ってみた闇魔術を触ってみた
闇魔術を触ってみたSatoshi Sato
 
GoogleのSHA-1のはなし
GoogleのSHA-1のはなしGoogleのSHA-1のはなし
GoogleのSHA-1のはなし
MITSUNARI Shigeo
 
【Unite Tokyo 2018】さては非同期だなオメー!async/await完全に理解しよう
【Unite Tokyo 2018】さては非同期だなオメー!async/await完全に理解しよう【Unite Tokyo 2018】さては非同期だなオメー!async/await完全に理解しよう
【Unite Tokyo 2018】さては非同期だなオメー!async/await完全に理解しよう
Unity Technologies Japan K.K.
 

What's hot (20)

“bcache”を使ってSSDの速さと HDDの大容量のいいとこどり 2015-12-12
“bcache”を使ってSSDの速さと HDDの大容量のいいとこどり 2015-12-12“bcache”を使ってSSDの速さと HDDの大容量のいいとこどり 2015-12-12
“bcache”を使ってSSDの速さと HDDの大容量のいいとこどり 2015-12-12
 
C/C++プログラマのための開発ツール
C/C++プログラマのための開発ツールC/C++プログラマのための開発ツール
C/C++プログラマのための開発ツール
 
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
 
メタプログラミングって何だろう
メタプログラミングって何だろうメタプログラミングって何だろう
メタプログラミングって何だろう
 
Ruby でつくる型付き Ruby
Ruby でつくる型付き RubyRuby でつくる型付き Ruby
Ruby でつくる型付き Ruby
 
エラーハンドリング
エラーハンドリングエラーハンドリング
エラーハンドリング
 
Xeon PhiとN体計算コーディング x86/x64最適化勉強会6(@k_nitadoriさんの代理アップ)
Xeon PhiとN体計算コーディング x86/x64最適化勉強会6(@k_nitadoriさんの代理アップ)Xeon PhiとN体計算コーディング x86/x64最適化勉強会6(@k_nitadoriさんの代理アップ)
Xeon PhiとN体計算コーディング x86/x64最適化勉強会6(@k_nitadoriさんの代理アップ)
 
TRICK 2022 Results
TRICK 2022 ResultsTRICK 2022 Results
TRICK 2022 Results
 
なぜなにリアルタイムレンダリング
なぜなにリアルタイムレンダリングなぜなにリアルタイムレンダリング
なぜなにリアルタイムレンダリング
 
ドキュメントを作りたくなってしまう魔法のツールSphinx
ドキュメントを作りたくなってしまう魔法のツールSphinxドキュメントを作りたくなってしまう魔法のツールSphinx
ドキュメントを作りたくなってしまう魔法のツールSphinx
 
ネットワーク ゲームにおけるTCPとUDPの使い分け
ネットワーク ゲームにおけるTCPとUDPの使い分けネットワーク ゲームにおけるTCPとUDPの使い分け
ネットワーク ゲームにおけるTCPとUDPの使い分け
 
WebP入門
WebP入門WebP入門
WebP入門
 
実践 WebRTC 〜最新事例と開発ノウハウの紹介〜
実践 WebRTC 〜最新事例と開発ノウハウの紹介〜実践 WebRTC 〜最新事例と開発ノウハウの紹介〜
実践 WebRTC 〜最新事例と開発ノウハウの紹介〜
 
Rust Programming Language
Rust Programming LanguageRust Programming Language
Rust Programming Language
 
shared_ptrとゲームプログラミングでのメモリ管理
shared_ptrとゲームプログラミングでのメモリ管理shared_ptrとゲームプログラミングでのメモリ管理
shared_ptrとゲームプログラミングでのメモリ管理
 
Rustで楽しむ競技プログラミング
Rustで楽しむ競技プログラミングRustで楽しむ競技プログラミング
Rustで楽しむ競技プログラミング
 
TLS, HTTP/2演習
TLS, HTTP/2演習TLS, HTTP/2演習
TLS, HTTP/2演習
 
闇魔術を触ってみた
闇魔術を触ってみた闇魔術を触ってみた
闇魔術を触ってみた
 
GoogleのSHA-1のはなし
GoogleのSHA-1のはなしGoogleのSHA-1のはなし
GoogleのSHA-1のはなし
 
【Unite Tokyo 2018】さては非同期だなオメー!async/await完全に理解しよう
【Unite Tokyo 2018】さては非同期だなオメー!async/await完全に理解しよう【Unite Tokyo 2018】さては非同期だなオメー!async/await完全に理解しよう
【Unite Tokyo 2018】さては非同期だなオメー!async/await完全に理解しよう
 

Similar to Optcarrot: A Pure-Ruby NES Emulator

Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unity Technologies
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
eurobsdcon
 
Terrain Rendering using GPU-Based Geometry Clipmaps
Terrain Rendering usingGPU-Based Geometry ClipmapsTerrain Rendering usingGPU-Based Geometry Clipmaps
Terrain Rendering using GPU-Based Geometry Clipmaps
none299359
 
The Next Generation of PhyreEngine
The Next Generation of PhyreEngineThe Next Generation of PhyreEngine
The Next Generation of PhyreEngine
Slide_N
 
Large-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at FacebookLarge-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at Facebook
Faisal Siddiqi
 
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Paolo Negri
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
 
Practical SPU Programming in God of War III
Practical SPU Programming in God of War IIIPractical SPU Programming in God of War III
Practical SPU Programming in God of War III
Slide_N
 
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Bruno Castelucci
 
Rails Hardware (no conclusions!)
Rails Hardware (no conclusions!)Rails Hardware (no conclusions!)
Rails Hardware (no conclusions!)yarry
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
Takuya ASADA
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
Bigstep
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016
Pierre Mavro
 
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Anne Nicolas
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon
 
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Jeff Hung
 
Emulating With JavaScript
Emulating With JavaScriptEmulating With JavaScript
Emulating With JavaScript
alexanderdickson
 
Rails performance at Justin.tv - Guillaume Luccisano
Rails performance at Justin.tv - Guillaume LuccisanoRails performance at Justin.tv - Guillaume Luccisano
Rails performance at Justin.tv - Guillaume Luccisano
Guillaume Luccisano
 
Infrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxInfrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black Box
Miklos Szel
 

Similar to Optcarrot: A Pure-Ruby NES Emulator (20)

Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
 
Terrain Rendering using GPU-Based Geometry Clipmaps
Terrain Rendering usingGPU-Based Geometry ClipmapsTerrain Rendering usingGPU-Based Geometry Clipmaps
Terrain Rendering using GPU-Based Geometry Clipmaps
 
The Next Generation of PhyreEngine
The Next Generation of PhyreEngineThe Next Generation of PhyreEngine
The Next Generation of PhyreEngine
 
Large-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at FacebookLarge-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at Facebook
 
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
Practical SPU Programming in God of War III
Practical SPU Programming in God of War IIIPractical SPU Programming in God of War III
Practical SPU Programming in God of War III
 
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
 
Rails Hardware (no conclusions!)
Rails Hardware (no conclusions!)Rails Hardware (no conclusions!)
Rails Hardware (no conclusions!)
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016
 
Rails Performance
Rails PerformanceRails Performance
Rails Performance
 
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
 
Emulating With JavaScript
Emulating With JavaScriptEmulating With JavaScript
Emulating With JavaScript
 
Rails performance at Justin.tv - Guillaume Luccisano
Rails performance at Justin.tv - Guillaume LuccisanoRails performance at Justin.tv - Guillaume Luccisano
Rails performance at Justin.tv - Guillaume Luccisano
 
Infrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxInfrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black Box
 

More from mametter

クックパッド春の超絶技巧パンまつり 超絶技巧プログラミング編 資料
クックパッド春の超絶技巧パンまつり 超絶技巧プログラミング編 資料クックパッド春の超絶技巧パンまつり 超絶技巧プログラミング編 資料
クックパッド春の超絶技巧パンまつり 超絶技巧プログラミング編 資料
mametter
 
Enjoy Ruby Programming in IDE and TypeProf
Enjoy Ruby Programming in IDE and TypeProfEnjoy Ruby Programming in IDE and TypeProf
Enjoy Ruby Programming in IDE and TypeProf
mametter
 
TypeProf for IDE: Enrich Development Experience without Annotations
TypeProf for IDE: Enrich Development Experience without AnnotationsTypeProf for IDE: Enrich Development Experience without Annotations
TypeProf for IDE: Enrich Development Experience without Annotations
mametter
 
Ruby 3の型解析に向けた計画
Ruby 3の型解析に向けた計画Ruby 3の型解析に向けた計画
Ruby 3の型解析に向けた計画
mametter
 
emruby: ブラウザで動くRuby
emruby: ブラウザで動くRubyemruby: ブラウザで動くRuby
emruby: ブラウザで動くRuby
mametter
 
Type Profiler: Ambitious Type Inference for Ruby 3
Type Profiler: Ambitious Type Inference for Ruby 3Type Profiler: Ambitious Type Inference for Ruby 3
Type Profiler: Ambitious Type Inference for Ruby 3
mametter
 
型プロファイラ:抽象解釈に基づくRuby 3の静的解析
型プロファイラ:抽象解釈に基づくRuby 3の静的解析型プロファイラ:抽象解釈に基づくRuby 3の静的解析
型プロファイラ:抽象解釈に基づくRuby 3の静的解析
mametter
 
Ruby 3の型推論やってます
Ruby 3の型推論やってますRuby 3の型推論やってます
Ruby 3の型推論やってます
mametter
 
マニアックなRuby 2.7新機能紹介
マニアックなRuby 2.7新機能紹介マニアックなRuby 2.7新機能紹介
マニアックなRuby 2.7新機能紹介
mametter
 
A Static Type Analyzer of Untyped Ruby Code for Ruby 3
A Static Type Analyzer of Untyped Ruby Code for Ruby 3A Static Type Analyzer of Untyped Ruby Code for Ruby 3
A Static Type Analyzer of Untyped Ruby Code for Ruby 3
mametter
 
A Plan towards Ruby 3 Types
A Plan towards Ruby 3 TypesA Plan towards Ruby 3 Types
A Plan towards Ruby 3 Types
mametter
 
Ruby 3 の型解析に向けた計画
Ruby 3 の型解析に向けた計画Ruby 3 の型解析に向けた計画
Ruby 3 の型解析に向けた計画
mametter
 
A Type-level Ruby Interpreter for Testing and Understanding
A Type-level Ruby Interpreter for Testing and UnderstandingA Type-level Ruby Interpreter for Testing and Understanding
A Type-level Ruby Interpreter for Testing and Understanding
mametter
 
本番環境で使える実行コード記録機能
本番環境で使える実行コード記録機能本番環境で使える実行コード記録機能
本番環境で使える実行コード記録機能
mametter
 
Transcendental Programming in Ruby
Transcendental Programming in RubyTranscendental Programming in Ruby
Transcendental Programming in Ruby
mametter
 
Cookpad Hackarade #04: Create Your Own Interpreter
Cookpad Hackarade #04: Create Your Own InterpreterCookpad Hackarade #04: Create Your Own Interpreter
Cookpad Hackarade #04: Create Your Own Interpreter
mametter
 
Ruby 3のキーワード引数について考える
Ruby 3のキーワード引数について考えるRuby 3のキーワード引数について考える
Ruby 3のキーワード引数について考える
mametter
 
TRICK 2018 results
TRICK 2018 resultsTRICK 2018 results
TRICK 2018 results
mametter
 
Type Profiler: An Analysis to guess type signatures
Type Profiler: An Analysis to guess type signaturesType Profiler: An Analysis to guess type signatures
Type Profiler: An Analysis to guess type signatures
mametter
 
Esoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in RubyEsoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in Ruby
mametter
 

More from mametter (20)

クックパッド春の超絶技巧パンまつり 超絶技巧プログラミング編 資料
クックパッド春の超絶技巧パンまつり 超絶技巧プログラミング編 資料クックパッド春の超絶技巧パンまつり 超絶技巧プログラミング編 資料
クックパッド春の超絶技巧パンまつり 超絶技巧プログラミング編 資料
 
Enjoy Ruby Programming in IDE and TypeProf
Enjoy Ruby Programming in IDE and TypeProfEnjoy Ruby Programming in IDE and TypeProf
Enjoy Ruby Programming in IDE and TypeProf
 
TypeProf for IDE: Enrich Development Experience without Annotations
TypeProf for IDE: Enrich Development Experience without AnnotationsTypeProf for IDE: Enrich Development Experience without Annotations
TypeProf for IDE: Enrich Development Experience without Annotations
 
Ruby 3の型解析に向けた計画
Ruby 3の型解析に向けた計画Ruby 3の型解析に向けた計画
Ruby 3の型解析に向けた計画
 
emruby: ブラウザで動くRuby
emruby: ブラウザで動くRubyemruby: ブラウザで動くRuby
emruby: ブラウザで動くRuby
 
Type Profiler: Ambitious Type Inference for Ruby 3
Type Profiler: Ambitious Type Inference for Ruby 3Type Profiler: Ambitious Type Inference for Ruby 3
Type Profiler: Ambitious Type Inference for Ruby 3
 
型プロファイラ:抽象解釈に基づくRuby 3の静的解析
型プロファイラ:抽象解釈に基づくRuby 3の静的解析型プロファイラ:抽象解釈に基づくRuby 3の静的解析
型プロファイラ:抽象解釈に基づくRuby 3の静的解析
 
Ruby 3の型推論やってます
Ruby 3の型推論やってますRuby 3の型推論やってます
Ruby 3の型推論やってます
 
マニアックなRuby 2.7新機能紹介
マニアックなRuby 2.7新機能紹介マニアックなRuby 2.7新機能紹介
マニアックなRuby 2.7新機能紹介
 
A Static Type Analyzer of Untyped Ruby Code for Ruby 3
A Static Type Analyzer of Untyped Ruby Code for Ruby 3A Static Type Analyzer of Untyped Ruby Code for Ruby 3
A Static Type Analyzer of Untyped Ruby Code for Ruby 3
 
A Plan towards Ruby 3 Types
A Plan towards Ruby 3 TypesA Plan towards Ruby 3 Types
A Plan towards Ruby 3 Types
 
Ruby 3 の型解析に向けた計画
Ruby 3 の型解析に向けた計画Ruby 3 の型解析に向けた計画
Ruby 3 の型解析に向けた計画
 
A Type-level Ruby Interpreter for Testing and Understanding
A Type-level Ruby Interpreter for Testing and UnderstandingA Type-level Ruby Interpreter for Testing and Understanding
A Type-level Ruby Interpreter for Testing and Understanding
 
本番環境で使える実行コード記録機能
本番環境で使える実行コード記録機能本番環境で使える実行コード記録機能
本番環境で使える実行コード記録機能
 
Transcendental Programming in Ruby
Transcendental Programming in RubyTranscendental Programming in Ruby
Transcendental Programming in Ruby
 
Cookpad Hackarade #04: Create Your Own Interpreter
Cookpad Hackarade #04: Create Your Own InterpreterCookpad Hackarade #04: Create Your Own Interpreter
Cookpad Hackarade #04: Create Your Own Interpreter
 
Ruby 3のキーワード引数について考える
Ruby 3のキーワード引数について考えるRuby 3のキーワード引数について考える
Ruby 3のキーワード引数について考える
 
TRICK 2018 results
TRICK 2018 resultsTRICK 2018 results
TRICK 2018 results
 
Type Profiler: An Analysis to guess type signatures
Type Profiler: An Analysis to guess type signaturesType Profiler: An Analysis to guess type signatures
Type Profiler: An Analysis to guess type signatures
 
Esoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in RubyEsoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in Ruby
 

Recently uploaded

高仿(nyu毕业证书)美国纽约大学毕业证文凭毕业证原版一模一样
高仿(nyu毕业证书)美国纽约大学毕业证文凭毕业证原版一模一样高仿(nyu毕业证书)美国纽约大学毕业证文凭毕业证原版一模一样
高仿(nyu毕业证书)美国纽约大学毕业证文凭毕业证原版一模一样
9u08k0x
 
_7 OTT App Builders to Support the Development of Your Video Applications_.pdf
_7 OTT App Builders to Support the Development of Your Video Applications_.pdf_7 OTT App Builders to Support the Development of Your Video Applications_.pdf
_7 OTT App Builders to Support the Development of Your Video Applications_.pdf
Mega P
 
Scandal! Teasers June 2024 on etv Forum.co.za
Scandal! Teasers June 2024 on etv Forum.co.zaScandal! Teasers June 2024 on etv Forum.co.za
Scandal! Teasers June 2024 on etv Forum.co.za
Isaac More
 
Young Tom Selleck: A Journey Through His Early Years and Rise to Stardom
Young Tom Selleck: A Journey Through His Early Years and Rise to StardomYoung Tom Selleck: A Journey Through His Early Years and Rise to Stardom
Young Tom Selleck: A Journey Through His Early Years and Rise to Stardom
greendigital
 
Treasure Hunt Puzzles, Treasure Hunt Puzzles online
Treasure Hunt Puzzles, Treasure Hunt Puzzles onlineTreasure Hunt Puzzles, Treasure Hunt Puzzles online
Treasure Hunt Puzzles, Treasure Hunt Puzzles online
Hidden Treasure Hunts
 
哪里买(osu毕业证书)美国俄勒冈州立大学毕业证双学位证书原版一模一样
哪里买(osu毕业证书)美国俄勒冈州立大学毕业证双学位证书原版一模一样哪里买(osu毕业证书)美国俄勒冈州立大学毕业证双学位证书原版一模一样
哪里买(osu毕业证书)美国俄勒冈州立大学毕业证双学位证书原版一模一样
9u08k0x
 
A TO Z INDIA Monthly Magazine - JUNE 2024
A TO Z INDIA Monthly Magazine - JUNE 2024A TO Z INDIA Monthly Magazine - JUNE 2024
A TO Z INDIA Monthly Magazine - JUNE 2024
Indira Srivatsa
 
This Is The First All Category Quiz That I Made
This Is The First All Category Quiz That I MadeThis Is The First All Category Quiz That I Made
This Is The First All Category Quiz That I Made
Aarush Ghate
 
The Ultimate Guide to Setting Up Eternal IPTV on Your Devices.docx
The Ultimate Guide to Setting Up Eternal IPTV on Your Devices.docxThe Ultimate Guide to Setting Up Eternal IPTV on Your Devices.docx
The Ultimate Guide to Setting Up Eternal IPTV on Your Devices.docx
Xtreame HDTV
 
Tom Selleck Net Worth: A Comprehensive Analysis
Tom Selleck Net Worth: A Comprehensive AnalysisTom Selleck Net Worth: A Comprehensive Analysis
Tom Selleck Net Worth: A Comprehensive Analysis
greendigital
 
Modern Radio Frequency Access Control Systems: The Key to Efficiency and Safety
Modern Radio Frequency Access Control Systems: The Key to Efficiency and SafetyModern Radio Frequency Access Control Systems: The Key to Efficiency and Safety
Modern Radio Frequency Access Control Systems: The Key to Efficiency and Safety
AITIX LLC
 
Hollywood Actress - The 250 hottest gallery
Hollywood Actress - The 250 hottest galleryHollywood Actress - The 250 hottest gallery
Hollywood Actress - The 250 hottest gallery
Zsolt Nemeth
 
240529_Teleprotection Global Market Report 2024.pdf
240529_Teleprotection Global Market Report 2024.pdf240529_Teleprotection Global Market Report 2024.pdf
240529_Teleprotection Global Market Report 2024.pdf
Madhura TBRC
 
Christina's Baby Shower Game June 2024.pptx
Christina's Baby Shower Game June 2024.pptxChristina's Baby Shower Game June 2024.pptx
Christina's Baby Shower Game June 2024.pptx
madeline604788
 
Emcee Profile_ Subbu from Bangalore .pdf
Emcee Profile_ Subbu from Bangalore .pdfEmcee Profile_ Subbu from Bangalore .pdf
Emcee Profile_ Subbu from Bangalore .pdf
subran
 
Snoopy boards the big bow wow musical __
Snoopy boards the big bow wow musical __Snoopy boards the big bow wow musical __
Snoopy boards the big bow wow musical __
catcabrera
 
Meet Dinah Mattingly – Larry Bird’s Partner in Life and Love
Meet Dinah Mattingly – Larry Bird’s Partner in Life and LoveMeet Dinah Mattingly – Larry Bird’s Partner in Life and Love
Meet Dinah Mattingly – Larry Bird’s Partner in Life and Love
get joys
 
Matt Rife Cancels Shows Due to Health Concerns, Reschedules Tour Dates.pdf
Matt Rife Cancels Shows Due to Health Concerns, Reschedules Tour Dates.pdfMatt Rife Cancels Shows Due to Health Concerns, Reschedules Tour Dates.pdf
Matt Rife Cancels Shows Due to Health Concerns, Reschedules Tour Dates.pdf
Azura Everhart
 
I Know Dino Trivia: Part 3. Test your dino knowledge
I Know Dino Trivia: Part 3. Test your dino knowledgeI Know Dino Trivia: Part 3. Test your dino knowledge
I Know Dino Trivia: Part 3. Test your dino knowledge
Sabrina Ricci
 
Skeem Saam in June 2024 available on Forum
Skeem Saam in June 2024 available on ForumSkeem Saam in June 2024 available on Forum
Skeem Saam in June 2024 available on Forum
Isaac More
 

Recently uploaded (20)

高仿(nyu毕业证书)美国纽约大学毕业证文凭毕业证原版一模一样
高仿(nyu毕业证书)美国纽约大学毕业证文凭毕业证原版一模一样高仿(nyu毕业证书)美国纽约大学毕业证文凭毕业证原版一模一样
高仿(nyu毕业证书)美国纽约大学毕业证文凭毕业证原版一模一样
 
_7 OTT App Builders to Support the Development of Your Video Applications_.pdf
_7 OTT App Builders to Support the Development of Your Video Applications_.pdf_7 OTT App Builders to Support the Development of Your Video Applications_.pdf
_7 OTT App Builders to Support the Development of Your Video Applications_.pdf
 
Scandal! Teasers June 2024 on etv Forum.co.za
Scandal! Teasers June 2024 on etv Forum.co.zaScandal! Teasers June 2024 on etv Forum.co.za
Scandal! Teasers June 2024 on etv Forum.co.za
 
Young Tom Selleck: A Journey Through His Early Years and Rise to Stardom
Young Tom Selleck: A Journey Through His Early Years and Rise to StardomYoung Tom Selleck: A Journey Through His Early Years and Rise to Stardom
Young Tom Selleck: A Journey Through His Early Years and Rise to Stardom
 
Treasure Hunt Puzzles, Treasure Hunt Puzzles online
Treasure Hunt Puzzles, Treasure Hunt Puzzles onlineTreasure Hunt Puzzles, Treasure Hunt Puzzles online
Treasure Hunt Puzzles, Treasure Hunt Puzzles online
 
哪里买(osu毕业证书)美国俄勒冈州立大学毕业证双学位证书原版一模一样
哪里买(osu毕业证书)美国俄勒冈州立大学毕业证双学位证书原版一模一样哪里买(osu毕业证书)美国俄勒冈州立大学毕业证双学位证书原版一模一样
哪里买(osu毕业证书)美国俄勒冈州立大学毕业证双学位证书原版一模一样
 
A TO Z INDIA Monthly Magazine - JUNE 2024
A TO Z INDIA Monthly Magazine - JUNE 2024A TO Z INDIA Monthly Magazine - JUNE 2024
A TO Z INDIA Monthly Magazine - JUNE 2024
 
This Is The First All Category Quiz That I Made
This Is The First All Category Quiz That I MadeThis Is The First All Category Quiz That I Made
This Is The First All Category Quiz That I Made
 
The Ultimate Guide to Setting Up Eternal IPTV on Your Devices.docx
The Ultimate Guide to Setting Up Eternal IPTV on Your Devices.docxThe Ultimate Guide to Setting Up Eternal IPTV on Your Devices.docx
The Ultimate Guide to Setting Up Eternal IPTV on Your Devices.docx
 
Tom Selleck Net Worth: A Comprehensive Analysis
Tom Selleck Net Worth: A Comprehensive AnalysisTom Selleck Net Worth: A Comprehensive Analysis
Tom Selleck Net Worth: A Comprehensive Analysis
 
Modern Radio Frequency Access Control Systems: The Key to Efficiency and Safety
Modern Radio Frequency Access Control Systems: The Key to Efficiency and SafetyModern Radio Frequency Access Control Systems: The Key to Efficiency and Safety
Modern Radio Frequency Access Control Systems: The Key to Efficiency and Safety
 
Hollywood Actress - The 250 hottest gallery
Hollywood Actress - The 250 hottest galleryHollywood Actress - The 250 hottest gallery
Hollywood Actress - The 250 hottest gallery
 
240529_Teleprotection Global Market Report 2024.pdf
240529_Teleprotection Global Market Report 2024.pdf240529_Teleprotection Global Market Report 2024.pdf
240529_Teleprotection Global Market Report 2024.pdf
 
Christina's Baby Shower Game June 2024.pptx
Christina's Baby Shower Game June 2024.pptxChristina's Baby Shower Game June 2024.pptx
Christina's Baby Shower Game June 2024.pptx
 
Emcee Profile_ Subbu from Bangalore .pdf
Emcee Profile_ Subbu from Bangalore .pdfEmcee Profile_ Subbu from Bangalore .pdf
Emcee Profile_ Subbu from Bangalore .pdf
 
Snoopy boards the big bow wow musical __
Snoopy boards the big bow wow musical __Snoopy boards the big bow wow musical __
Snoopy boards the big bow wow musical __
 
Meet Dinah Mattingly – Larry Bird’s Partner in Life and Love
Meet Dinah Mattingly – Larry Bird’s Partner in Life and LoveMeet Dinah Mattingly – Larry Bird’s Partner in Life and Love
Meet Dinah Mattingly – Larry Bird’s Partner in Life and Love
 
Matt Rife Cancels Shows Due to Health Concerns, Reschedules Tour Dates.pdf
Matt Rife Cancels Shows Due to Health Concerns, Reschedules Tour Dates.pdfMatt Rife Cancels Shows Due to Health Concerns, Reschedules Tour Dates.pdf
Matt Rife Cancels Shows Due to Health Concerns, Reschedules Tour Dates.pdf
 
I Know Dino Trivia: Part 3. Test your dino knowledge
I Know Dino Trivia: Part 3. Test your dino knowledgeI Know Dino Trivia: Part 3. Test your dino knowledge
I Know Dino Trivia: Part 3. Test your dino knowledge
 
Skeem Saam in June 2024 available on Forum
Skeem Saam in June 2024 available on ForumSkeem Saam in June 2024 available on Forum
Skeem Saam in June 2024 available on Forum
 

Optcarrot: A Pure-Ruby NES Emulator

  • 1.
  • 2. • A NES Emulator written in Ruby Demo 2
  • 3. • To drive “Ruby3x3” – Matz said “Ruby 3 will be 3 times faster than Ruby 2.0” – Optcarrot is a CPU-intensive, real-life benchmark • Currently works at 20 fps in Ruby 2.0  60 fps in 3.0! • A carrot to let horses (Ruby committers) optimize Ruby • To challenge Ruby’s limit – NES video resolution: 256 x 240 pixels / 60 fps – We need to do all other tasks in 0.8 sec.? Impossible? (256*240*60).times do |i| ary[0] = 0 end 0.2 sec. 3
  • 4. • Famicom programming with Ruby (takkaw, 2007) – Presentation NES ROM by Ruby • MRI's incremental GC (authornari, 2008) – Mario-like game "Nario" is used to demonstrate the real-time GC • Burn (remore, 2014) – A framework to create NES ROM in Ruby 4
  • 5. • NES architecture in three minutes • How I achieved 20 fps • Ruby interpreters’ benchmark • Towards 60 fps • Speaker's award & Conclusion 5
  • 6. • The details of NES architecture – In short: “See http://wiki.nesdev.com/ !” • How to find the bottleneck – In short: “Use stackprof!” 6 川崎Ruby会議01 (2016/08/20) • I’ll talk these topics at “Kawasaki Ruby Kaigi 01”
  • 7. •  NES Architecture in three minutes  • How I achieved 20 fps • Ruby interpreters’ benchmark • Towards 60 fps • Speaker's award & Conclusion 7
  • 8. CPU GPU Program ROM Bitmap ROM Cartridge NES RAM (2 kB) VRAM (2 kB) control read read/write read render read/write To be precise: GPU is called as “PPU” (Picture Processing Unit) in NES interrupt 8
  • 9. GPU 80% CPU 10% others 10% Execution time ratio • Why does GPU emulation take so much? – GPU runs at higher clock speed than CPU • GPU: 5.3 MHz • CPU: 1.8 MHz – GPU does many complex tasks • Background rendering • Sprite rendering • Scrolling • Conflict detection • Interrupts 9
  • 10. • Per-pixel tasks (i.e. 256 x 240 x 60 = 3.7M times per second) 1. Identify what bitmap is shown here 2. Read attribute data (color, flip flag, z-index) 3. Read bitmap data from the ROM 4. Assemble them into video signal Background map Attribute map VRAM GPU2 1 3 4 Target pixel To be precise: These tasks are actually done per eight pixels 10 Bitmap ROM Cartridge
  • 12. • NES Architecture in three minutes •  How I achieved 20 fps  – How to emulate CPU-GPU parallelism – How to optimize GPU emulation • Ruby interpreters’ benchmark • Towards 60 fps • Speaker's award & Conclusion 12
  • 13. • Naïve approach: emulate CPU & GPU per clock 1. Run the CPU for one clock 2. Run the GPU for three clocks 3. Repeat 1 and 2 – Simple and accurate – Very slow (~ 3 fps) because of too many method calls CPU step step step step step step step step step step step step step step step step clock GPU 13
  • 14. • “Catch-up” method: emulate CPU&GPU per control 1. Run the CPU until it tries to control the GPU 2. Run the GPU until it catch up with the CPU 3. Repeat 1 and 2 – Accurate and fast (~ 10 fps) CPU run catchup run catchup run clock GPU CPU attempts to control GPU 14
  • 15. • Naïve approach: per-pixel emulation – Just as like the actual hardware Bitmap ROM Background map Attribute map VRAM GPU2 1 3 4 This calculation is done for each iteration  Slow! 15 Cartridge
  • 16. • Pre-render the screen and update it on demand Background map Attribute map VRAM GPU screen buffer When VRAM is modified by CPU, Only invalidated pixels is updated Transported to TV per frame This explanation is over exaggerated! Actually, the GPU emulation loop is not removed completely. 16 Bitmap ROM Cartridge
  • 17.  • Intel® Core™ i7-4500U @ 2.40 GHz • Ubuntu 16.04 17
  • 18. • NES Architecture in three minutes • How I achieved 20 fps •  Ruby interpreters’ benchmark  • Towards 60 fps • Speaker's award & Conclusion 18
  • 19. • Is not so big: <5000 lines of code – cf. redmine: >30000 LOC • Requires no library (in no-GUI mode) – It works on miniruby – ruby-ffi is used for GUI (SDL2) • Uses only basic Ruby features – It works on ruby 1.8 / mruby / topaz / opal (with shim and/or systematic modification of source code) 19
  • 20. 28.7 28.1 25.5 26.6 25.0 21.4 5.83 21.9 39.2 25.0 4.10 7.48 27.0 0.0287 0.0 10.0 20.0 30.0 40.0 trunk ruby23 ruby22 ruby21 ruby20 ruby193 ruby187 omrpreview jruby9k jruby17 rubinius mruby topaz opal 20 MRI has been improved (1.81.92.02.3) OMR preview isn’t fast? (MRI 2.2 w/ JIT) JRuby9k is the fastest ruby 2.0 achives >20 fps (important for Ruby3x3) Optcarrot works on subset Ruby impls.
  • 21. • JRuby 9k is the fastest: “Deoptimization” looks a promising approach – At first, an optimized byte-code is generated with ignoring rare/pathological cases – When needed, it is discarded and a naïve byte-code is regenerated – BTW: JRuby‘s boot time is too bad • OMR is not so fast? – JIT has no advantage? • Method calls and built-in methods may be still bottleneck – OMR seems not to support opt_case_dispatch yet • i.e., a case statement is not optimized well? 21
  • 22. • NES Architecture in three minutes • How I achieved 20 fps • Ruby interpreters’ benchmark •  Towards 60 fps  • Speaker's award & Conclusion 22
  • 23. ™ • We have kept the code reasonably clean so far • Now, we use any means to achieve the speed • CAUTION: Casual Ruby programmers MUST NOT use the following ProTips™ – This is an experiment to study how to improve Ruby implementation 23
  • 24. ™ • Method call is slow – Replace it with its method definition while catchup? inc_addr end while catchup? @addr += 1 end 28 fps  40 fps 24
  • 25. ™ • Instance variable access is slow – Replace it with local variable – Note: the variable must not be used out of this method while catchup? @addr += 1 end begin addr = @addr while catchup? addr += 1 end ensure @addr = addr end 40 fps  47 fps 25
  • 26. • Batch multiple frequent actions across some clocks ™ while catchup? if can_be_fast? # fast-path do_A do_B do_C @clock += 3 else case @clock when 1 then do_A when 2 then do_B when 3 then do_C ... end @clock += 1 end end while catchup? case @clock when 1 then do_A when 2 then do_B when 3 then do_C ... end @clock += 1 end 47 fps  63 fps 26
  • 27. ™ 29.4 40.3 46.6 62.7 68.8 83.2 0.0 20.0 40.0 60.0 80.0 base method inlining ivar localization fastpath misc CPU misc ProTip™ 1 ProTip™ 2 ProTip™ 3 27
  • 28. • Used Regexp to systematically rewrite the code – instead of hand-rewriting • Used Welch’s t-test to confirm each optimization src = File.read(__FILE__) src.gsub!(/.../) { ... } # method inlining src.gsub!(/.../) { ... } # ivar localization eval(src) 28
  • 29. 29
  • 30. 28.6 28.0 25.2 26.9 26.1 21.4 5.87 22.8 39.3 25.3 3.97 7.02 29.3 0.0285 84.0 82.9 78.2 79.6 68.1 64.0 1.46 69.0 2.12 6.13 2.43 0.754 0.0501 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 trunk ruby23 ruby22 ruby21 ruby20 ruby193 ruby187 omrpreview jruby9k jruby17 rubinius mruby topaz opal default mode optimized mode The generated program is too large to fit JVM 64k bytecode limit 30
  • 31. • NES Architecture in three minutes • How I achieved 20 fps • Ruby interpreters’ benchmark • Towards 60 fps •  Speaker's award & Conclusion  31
  • 32. • The first person who improved MRI performance by using Optcarrot – Instance variable access has been improved about 10% [Bug #12274] • Optcarrot has already started to improve Ruby! 32
  • 33. • Optcarrot, a pure-Ruby NES emulator – Non-trivial benchmark for Ruby implementations • Wide-range Ruby implementation benchmark – AFAIK, this is the first real-life benchmark to compare MRI / Jruby / Rubinius / mruby / topaz / opal • ProTips™ for boosting a Ruby program – Need to improve method calls and instance variables instead of JIT? • More details?  33 川崎Ruby会議01 (2016/08/20)
  • 34. 34 ¥2,680 + tax ¥5,440 + tax