2. ABOUT ME
´ Oracle Consultant since 2001
´ Former developer (C, Java, perl, PL/SQL)
´ Blogger since 2004
´ http://laurent.leturgez.free.fr (In french and discontinued)
´ http://laurent-leturgez.com
´ Twitter : @lleturgez
´ Paris Oracle Meetup Organizer: @ParisOracle
´ OCM 11g
3. Agenda
´ SIMD Instructions, outside Oracle 12c
´ What is a SIMD instruction ?
´ Will my application use SIMD ?
´ Raw Performance
´ SIMD Instructions, inside Oracle 12c
´ How SIMD instructions are used inside Oracle 12c
´ Tracing SIMD in Oracle 12c
4. Caveats
´ Most of the topics are from
´ My own researches
´ My past life as a developer
´ Some of the topics are about internals, so:
´ Analysis and conclusion may be incomplete
´ Future versions of Oracle may change the features
´ Tests have been done with Oracle 12.1.0.2, Oracle
Enterprise Linux 7.1, VMWare Fusion 7 (And
VirtualBox)
5. Before we start …
´ Some fundamentals (from Dennis Yurichev’s book)
´ CPU register : […]The easiest way to understand a register is
to think of it as an untyped temporary variable. Imagine if
you were working with high-level PL1 and could only use
eight 32-bit (or 64-bit) variables. Yet a lot can be done using
just these!
´ Instruction : A primitive CPU command. The simplest
examples include: moving data between registers, working
with memory and arithmetic primitives. As a rule, each CPU
has its own instruction set architecture (ISA).
´ Assembly language : Mnemonic code and some extensions
like macros which are intended to make a programmer’s life
easier.
http://beginners.re/Reverse_Engineering_for_Beginners-en.pdf
6. Agenda
´ SIMD Instructions, outside Oracle 12c
´ What is a SIMD instruction ?
´ Will my application use SIMD ?
´ Raw Performance
´ SIMD Instructions, inside Oracle 12c
´ How SIMD instructions are used inside Oracle 12c
´ Tracing SIMD in Oracle 12c
7. SIMD instructions … outside
Oracle 12c
´ SIMD stands for Single Instruction Multiple Data
´ Process multiple data
´ In one CPU instruction
´ Based on
´ Specific registers
´ Specific CPU instructions and sets of instructions
´ Not Oracle specific
´ CPU Architecture specific
´ Intel
´ IBM
´ Sparc
´ This presentation is mainly about Intel architecture
8. SIMD instructions … outside
Oracle 12c
´ What is a SIMD register ?
´ It’s a CPU register
´ Wider than traditional registers (RDI, RSI, R8, R9 etc.)
´ 128 up to 512 bits wide
´ Contains many data
9. SIMD instructions … outside
Oracle 12c
´ Scalar operation
´ an array of 4 integers {1,2,3,4}
´ add 1 to each value
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
1
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
1
1
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
1
1
2
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
1
1
2
2
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
4
1
5
3 4 52
…/
…
LOAD ADD SAVE
4 LOAD
4 ADD
4 SAVE
10. SIMD instructions … outside
Oracle 12c
´ SIMD operation
´ an array of 4 integers {1,2,3,4}
´ add 1 to each value
SIMD Reg1
CPU
RAM
In
Out
2 3 41
1 1 11SIMD Reg2
SIMD Reg3
SIMD Reg1
CPU
RAM
In
Out
2 3 41
2 3 41
1 1 11SIMD Reg2
SIMD Reg3
SIMD Reg1
CPU
RAM
In
Out
2 3 41
2 3 41
1 1 11
3 4 52
SIMD Reg2
SIMD Reg3
SIMD Reg1
CPU
RAM
In
Out
2 3 41
3 4 52
2 3 41
1 1 11
3 4 52
SIMD Reg2
SIMD Reg3
LOAD ADD SAVE
11. SIMD instructions … outside
Oracle 12c
´ MMX: MultiMedia eXtensions (Pentium II)
´ 64 bits registers
´ 8 registers (MM0 to MM7)
´ SSE: Streaming SIMD Extensions: (Pentium III)
´ 128 bits registers
´ 8 registers (XMM0 to XMM7)
´ Only four 32 bits single precision floating point numbers
´ SSE2 (Pentium IV), SSE3 (Pentium IV Prescott, Xeon Nocona), SSSE3
(Xeon 5100, Core 2), SSE4.1 (Penryn), SSE4.2 (Nehalem)
´ 128 bits registers
´ 16 registers (XMM0 to XMM15)
´ Usage expansion (two 64 bits double precision, four 32 bits
integers until sixteen 8 bits bytes)
´ New instructions
12. SIMD instructions … outside
Oracle 12c
´ AVX: Advanced Vector eXtension (Sandy Bridge processors)
´ XMM registers are extended to 256 bits
´ 16 AVX registers named YMM0 to YMM15
´ Three operand instructions (non destructive) : A+B=C rather than
A=A+B
´ Some alignment requirements are relaxed
´ AVX2 (Introduced with Haswell processors)
´ 256 bits registers
´ New instructions (shifting, value broadcasting etc…)
´ AVX-512 or AVX3 (Skylake processors)
´ 512 bits registers
´ 32 registers named ZMM0 to ZMM31
´ AVX-1024 … the future
´ 1024 bits registers
13. SIMD instructions … outside
Oracle 12c
´ SIMD instructions
´ Reduce number of CPU cycles and memory pressure
´ Process data in parallel without any contention
´ Need a programming method (vector programming) with some
constraints (data alignments etc.)
´ Size matters
´ Wider registers, more data loaded (but wider register files
increase CPU power consumption à Challenge)
´ Processing is always done as a single CPU Cycle
´ More registers
´ Use cases
´ Data Filtering
´ Graphics
´ Bioinformatics …
16. Agenda
´ SIMD Instructions, outside Oracle 12c
´ What is a SIMD instruction ?
´ Will my application use SIMD ?
´ Raw Performance
´ SIMD Instructions, inside Oracle 12c
´ How SIMD instructions are used inside Oracle 12c
´ Tracing SIMD in Oracle 12c
17. Will my application use SIMD registers
and instructions ?
´ It depends on :
´ Hardware
´ Consult processors datasheets to see which instruction set
extensions are used (if many)
´ http://ark.intel.com/#@Processors
´ Hypervisor
´ Some (old) hypervisors do not support modern extensions
´ VirtualBox versions <5.0 don’t support SSE4, AVX and AVX2
´ Hyper-V on W2008R2-SP1 needs patch for specific processors
to support AVX
18. ´ It depends on the Operating System
´AVX (256 bits) is supported from
´ Linux Kernel >= 2.6.30
´ Redhat EL5 : 2.6.18
´ Oracle EL5 w/UEK : 2.6.32
AVX needs xsave kernel parameter
´ Solaris 10 upd 10 and Solaris 11
´ Windows 2008 R2 SP1
Will my application use SIMD registers
and instructions ?
19. ´ It depends on the compiler
´ GCC
´ > 4.6 for AVX support
´ Use of specific switches (-msse2, -msse4.1, msse4.2, -
mavx, -mavx2 …)
´ Intel C/C++ Compiler (ICC)
´ > 11.1 for AVX Support and > 13.0 for AVX2 support
´ Use of specific switches (-xsse4.2, -xavx, -xcore-avx2
…)
´ Beware of optimization switches (-O1,-O2, -O3)
´ More … disassemble (if you are allowed to J )
´ Registers
´ Assembler instructions
Will my application use SIMD registers
and instructions ?
20. Agenda
´ SIMD Instructions, outside Oracle 12c
´ What is a SIMD instruction ?
´ Will my application use SIMD ?
´ Raw Performance
´ SIMD Instructions, inside Oracle 12c
´ How SIMD instructions are used inside Oracle 12c
´ Tracing SIMD in Oracle 12c
21. ´ Based on a C program
´ Used CPU: Haswell microarchitecture (Core
i7-4960HQ). AVX/AVX2 enabled
´ 3 tests : No SIMD, SSE4, AVX
´ Input: one array containing 1Million values.
´ Goal: Add 1 to each value, each million
values repeated 4k, 8k, 16k and 32k times
´ CPU Time(s) = f(#rows)
“Quick and Dirty” Sample code available here:
https://app.box.com/s/ibmnbblpho4xtbeq2x8ir60nrk37208v
Raw performance
22. Raw performance
10,35
20,46
42,35
85,64
3,3 6,81
13,73
25,58
1,96 3,51 7,23
15,15
0
10
20
30
40
50
60
70
80
90
4096 M. ROWS 8192 M. ROWS 16384 M. ROWS 32768 M. ROWS
CPUTime(Sec)
RAW Performance (CPU) for SIMD Instructions
NO SIMD SSE4 (XMM Registers) AVX (YMM Registers)
23. Agenda
´ SIMD Instructions, outside Oracle 12c
´ What is a SIMD instruction ?
´ Will my application use SIMD ?
´ Raw Performance
´ SIMD Instructions, inside Oracle 12c
´ How SIMD instructions are used inside Oracle 12c
´ Tracing SIMD in Oracle 12c
24. SIMD instructions … inside
Oracle 12c
´ In Memory Data Structure
´ In Memory Compression Unit :
IMCU
´ IMCU is the unit of column store
allocation
´ Target size is 1M rows
(controlled by _inmemory_imcu_target_rows)
´ One IMCU can contain more
than one column
´ Each column in one IMCU is a
column unit (CU)
25. SIMD instructions … inside
Oracle 12c
´ In memory column store storage indexes
´ For each column unit, min and max values are
maintained in a storage index
´ Storage Indexes provide CU pruning
´ Information about CU available in GV$IM_COL_CU
(Undocumented. See BugID 19361690)
IMCU
Pruning
26. SIMD instructions … inside
Oracle 12c
´ The way your data is sorted matters for best IMCU pruning
27. SIMD instructions … inside
Oracle 12c
´ SIMD extensions are used with In Memory storage
indexes for efficient filtering
1. IM Storage Indexes do IMCU pruning
2. SIMD instructions apply efficiently filter predicates
IMCU
Pruning
Prod-id
10
10
14
14
10
Filtering
with SIMD
28. SIMD instructions … inside
Oracle 12c
´ Oracle 12c uses specific libraries for SIMD (and
compression)
´ Located in $ORACLE_HOME/lib
´ libshpksse4212.so for SSE4.2 extensions
Compiled with ICC v12 with specific xsse4.2 switch
´ libshpkavx12.so for AVX extensions
Compiled with ICC v12 with specific xavx switch
´ libshpkavx212.so for AVX2 extensions
Not yet implemented (8 functions implemented)
No ICC avx2 switch used because ICC v12 doesn’t support AVX2
´ Thanks Tanel Pöder
29. SIMD instructions … inside
Oracle 12c
´ Oracle SIMD related functions
´ Located in kdzk kernel module (HPK)
´ Part of Advanced Compression library (ADVCMP)
´ Easily tracked with systemtap
30. SIMD instructions … inside
Oracle 12c
´ How Oracle uses SIMD extensions ?
It depends on many parameters
´ OS Level : /proc/cpuinfo
´ AVX and AVX2 support
´ SSE4 Support only
31. SIMD instructions … inside
Oracle 12c
´ Which library am I using ?
´ pmap
´ AVX support
´ SSE4 support
32. SIMD instructions … inside
Oracle 12c
´ Which compiler options have been used ?
´ Read “comment” section in ELF
´ Read the corresponding compiler documentation
[oracle@oel7 conf]$ readelf -p .comment $ORACLE_HOME/lib/libshpkavx12.so |
> | egrep -i 'intel|gcc' | egrep 'xavx|mavx’
[ 2c] -?comment:Intel(R) C Intel(R) 64 Compiler XE for applications running on
Intel(R) 64, Version 12.0 Build 20120731
…/…
-DNTEV_USE_EPOLL -DNET_USE_LDAP -xavx
33. SIMD instructions … inside
Oracle 12c
´ How are SIMD registers used by Oracle ?
´ GDB
´ To get the call stack (backtrace)
´ To set breakpoints on interesting functions
´ To view register contents (traditional and SIMD)
´ “Info registers” for traditional registers
´ “Info all-registers” for all registers (SIMD reg included)
´ (gdb) print $ymmX.<format>
Format can be v8_float, v4_double, v32_int8, v16_int16, v8_int32,
v4_int64, or v2_int128
34. SIMD instructions … inside
Oracle 12c
In red, register content
has been modified
In blue, the second
part of the SIMD
registers (128 bits) is
empty
35. SIMD instructions … inside
Oracle 12c
´ Oracle IM can use AVX or SSE4 extensions for SIMD
operations
´ When AVX is used
It uses only 128 bits out of 256 bits wide registers
• AVX adds new register-state through the 256-bit wide
YMM register file
• Explicit operating system support is required to properly
save and restore AVX's expanded registers
between context switches
• Without this, only AVX 128-bit is supported
36. SIMD instructions … inside
Oracle 12c
´The culprit
´ Oracle 12.1.0.2 is supported from EL5 onwards
´ EL5 Redhat Kernel is 2.6.18 and this flag
(xsave) is supported from 2.6.30 kernels
´ For compatibility reasons, Oracle has to
compile its code on 2.6.18 kernels
37. SIMD instructions … inside
Oracle 12c
´Or maybe …
´ Oracle needs to use values packed below
32bits wide
38. Agenda
´ SIMD Instructions, outside Oracle 12c
´ What is a SIMD instruction ?
´ Will my application use SIMD ?
´ Raw Performance
´ SIMD Instructions, inside Oracle 12c
´ How SIMD instructions are used inside Oracle 12c
´ Tracing SIMD in Oracle 12c
39. Tracing SIMD in Oracle 12c
´ Oradebug has 2 components related to IM
40. Tracing SIMD in Oracle 12c
´ Interesting components to trace for SIMD
and/or IMCU Pruning are :
´ IM_optimizer
´Gives information about CBO calculation
related to IM
´ ADVCMP_DECOMP.*
´ADVCMP_DECOMP_HPK : SIMD functions
´ADVCMP_DECOMP_PCODE : Portable Code
Machine (usually comparison functions and
results)
41. Tracing SIMD in Oracle 12c
´ IM_optimizer
´ Information available in trace file
´ IMCU Pruning ratio
´ CU decompression costing (per IMCU)
´ Predicate evaluation costing (per row)
´ Statement has to be parsed to get results
42. Tracing SIMD in Oracle 12c
select prod_id,cust_id,time_id from laurent.s_capa_high where amount_sold=20;
43. Tracing SIMD in Oracle 12c
´ This information is available in CBO trace file (10053 or
SQL_costing event)
44. Tracing SIMD in Oracle 12c
´ ADVCMP_DECOMP
´ ADVCMP_DECOMP_HPK
´ Information is available in the trace file (for each IMCU
processed)
´ Used library and function
´ Number of rows and counting algorithm
´ Processing rate (comparison and decompression if relevant)
´ But nothing on the results of the processing L
45. Tracing SIMD in Oracle 12c
´ ADVCMP_DECOMP
´ ADVCMP_DECOMP_HPK
´ Gives information about SIMD function usage and filtering
(after IMCU pruning)
´ Example: inmemory table with NO MEMCOMPRESS or DML
compression
46. Tracing SIMD in Oracle 12c
´ ADVCMP_DECOMP
´ ADVCMP_DECOMP_HPK
´ Example: inmemory compressed table
´ SIMD are used only in the kdzk_eq_dict functions
47. Tracing SIMD in Oracle 12c
´ My thoughts about compression/decompression
´ NO MEMCOMPRESS / COMPRESS FOR DML
´ kdzk*dynp* functions (ex: kdzk_eq_dynp_16bit,
kdzk_le_dynp_32bit etc.)
´ FOR QUERY LOW / QUERY HIGH
´ Dictionary Encoding (LZW ?) : kdzk_*dict* functions (ex:
kdzk_eq_dict_7bit, kdzk_le_dict_4bit etc.)
´ Run Length Encoding: kdzk_burst_rle* functions (ex:
kdzk_burst_rle_8bit, kdzk_burst_rle_16bit …)
´ Bit packing compression: kdzk*fixed* functions (ex:
kdzk_ge_lt_fixed_32bit, kdzk_lt_fixed_8bit …)
48. Tracing SIMD in Oracle 12c
´ My thoughts about compression/decompression
´ FOR CAPACITY LOW
´ FOR QUERY LOW + additional proprietary compression (OZIP)
´ Functions: ozip_decode_dict*, kdzk_ozip_decode* (Ex:
kdzk_ozip_decode_dydi, ozip_decode_dict_9_bit etc.)
´ FOR CAPACITY HIGH
´ FOR QUERY HIGH + heavy weigth compression algorithm
´ Compression/decompression method depends on:
´ Datatype
´ Column Compression Unit size
´ Column contents