Exadata and OLTPEnkitec Extreme Exadata Expo     august 13-14 Dallas       Frits Hoogland
Who am I? Frits Hoogland – Working with Oracle products since 1996 – Working with VX Company since 2009 Interests – Data...
What is exadata    – Engineered system specifically for oracle database.    – Ability to reach high number of read IOPS an...
Exadata versions    – Oracle database 64 bit version >= 11    – ASM 64 bit version >= 11       - Exadata communication is ...
Exadata hardware    – Intel Xeon server hardware    – Infiniband 40Gb/s    – Oracle cell (storage) server       - Flash to...
Flash    – Flashcards are in every storage server    – Total of 384GB per storage server    – Do not confuse exadata STORA...
Flash    – Flash is restricted to 4x96GB = 384GB per storage      server.      - Totals:         - Q:1152GB, H:2688GB, F:5...
Exadata specific features    – The secret sauce of exadata: the storage server      - smartscan      - storage indexes    ...
OLTP     – How does OLTP look like (in general | simplistic)     – Fetch small amount of data       - Invoice numbers, cli...
SLOB     – A great way to mimic or measure OLTP performance is                                 SLOB     – Silly Little Ora...
SLOB –It can do reading: FOR i IN 1..5000 LOOP     v_r := dbms_random.value(257, 10000) ;  SELECT COUNT(c2) into x FROM cf...
SLOB –And writing: FOR i IN 1..500 LOOP         v_r := dbms_random.value(257, 10000) ;         UPDATE cf1 set      c2 = AA...
– Lets run SLOB with 0 writers and 1 reader at:     – Single instance database, 10G SGA, 9G cache.                        ...
15
16
17
18
1 reader results     - V2 time: 5 sec - CPU: 84.3%       - PIO:    10‟768 (0.8%)   -- IO time 0.8 sec       - LIO: 1‟299‟4...
1 reader conclusion     - The time spend on PIO is 15% - 45%     - Majority of time spend on LIO/CPU time     - Because ma...
LIO benchmark     – Let‟s do a pre-warmed cache run       - Pre-warmed means: no PIO, data already in BC       - This mean...
LIO benchmark     - ODA: 1 sec     - X2:   2 sec     - V2:   3 sec22
components!                                                 Use ‘dmidecode’ to look to the system’s     – Reason: LIO esse...
LIO benchmark24
LIO benchmark25
LIO benchmark26
LIO benchmark     Core difference and slower memory shows when # readers     exceeds core count.           Same memory spe...
LIO benchmark28
LIO benchmark     Lesser core’s and slower memory make LIO processing     increasingly slower with more concurrency       ...
– Conclusion:       - LIO performance is impacted by:         -   CPU speed         -   Number of sockets and core‟s      ...
– But how about physical IO?       - Lower the buffercache to 4M         - sga_max_size to 1g         - cpu_count to 1    ...
The V2 is the slowest with 106 seconds.     The X2 is only a little slower with 76 seconds.     Surprise! ODA is the faste...
Total time (s)   CPU time (s)   IO time (s)      ODA           73               17             60      X2            76   ...
– This is not random disk IO!       - Average latency of random IO 15k rpm disk ~ 5ms       - Average latency of random IO...
Total time (s)   CPU time (s)       IO time (s)     ODA            73               17                 60     X2          ...
Slob / 10 readers36
Now IO responsetime on ODA is way higher than     Exadata (3008s)      Both Exadata’s perform alike: X2 581s, V2 588s.37
Total time (s)   CPU time (s)   IO time (s)      ODA           3008             600            29428      X2            58...
Slob / 20 readers39
40
Total time (s)   CPU time (s)   IO time (s)      ODA           4503             1377           88756      X2            72...
Slob / 30 readers42
ODA (20 x 15krpm HDD) disk capacity is saturated so     response time increases with more readers.     Flashcache is not s...
Slob / up to 80 readers44
ODA response time more or less increases linear.                            The V2 response time (with more flashcache!) s...
IOPS view instead of responsetime                3x384GB Flashcache and IB can serve                > 115’851 read IOPS!  ...
- V2 top 5 timed events with 80 readers: Event                              Waits Time(s) (ms) time Wait Class -----------...
– On the V2, cache concurrency control throttles        throughput      – On the X2, this happens only very minimal       ...
1 LIO       80 LIO        1 PIO         80 PIO     ODA                     1            11           73             9795  ...
1 LIO          80 LIO         1 PIO          80 PIO     ODA              1             11             73            9795  ...
- For scalability, OLTP needs buffered IO (LIO)     - Flashcache is EXTREMELY important physial IO scalability       - Nev...
– How much IOPS can a single cell do?       - According to         https://blogs.oracle.com/mrbenchmark/entry/inside_the_s...
– Is Exadata a good idea for OLTP?       - From a strictly technical point of view, there is no benefit.     – But...     ...
– Exadata storage servers provide NO redundancy for       data       - That‟s a function of ASM     – Exadata is configure...
– Reading has no problem with normal/high redundancy.     – During writes, all two or three AU‟s need to be written.     –...
– But we got flash! Right?     – Yes, you got flash. But it probably doesn‟t do what you       think it does:59
– This is on the half rack V2 HP:     [oracle@dm01db01 [] stuff]$ dcli -l celladmin -g cell_group cellcli -e "list metricc...
–This is on the quarter rack X2 HC: [root@xxxxdb01 ~]# dcli -l root -g cell_group cellcli -e "list metriccurrent where nam...
– Please mind these are cumulative numbers!     – The half-rack is a POC machine, no heavy usage       between POC‟s.     ...
– Log writer wait times:       - V2 min: 16ms (1 writer), max: 41ms (20 writers)       - X2 min: 39ms (10 writers), max: 1...
– Log file write response time on Exadata is not in the       same range as reads.     – There‟s the flashlog feature, but...
Thank you for attending!     Questions and answers.65
Thanks to • Klaas-Jan Jongsma • VX Company • Martin Bach • Kevin Closson66
Upcoming SlideShare
Loading in …5
×

Exadata and OLTP

2,063 views

Published on

Published in: Technology

Exadata and OLTP

  1. 1. Exadata and OLTPEnkitec Extreme Exadata Expo august 13-14 Dallas Frits Hoogland
  2. 2. Who am I? Frits Hoogland – Working with Oracle products since 1996 – Working with VX Company since 2009 Interests – Databases, Operating Systems, Application Servers – Web techniques, TCP/IP, network security – Technical security, performanceTwitter: @fritshoogland Blog: http://fritshoogland.wordpress.com Email: fhoogland@vxcompany.com Oracle ACE Director OakTable member
  3. 3. What is exadata – Engineered system specifically for oracle database. – Ability to reach high number of read IOPS and huge bandwidth. – Has it‟s own patchbundles. – Validated versions and patch versions across database, clusterware, o/s and storage, firmware. – Dedicated, private storage for databases. – ASM. – Recent hardware & recent CPU. – No virtualisation.3
  4. 4. Exadata versions – Oracle database 64 bit version >= 11 – ASM 64 bit version >= 11 - Exadata communication is layer in skgxp code – Linux OL5 x64 - No UEK kernel used (except X2-8)4
  5. 5. Exadata hardware – Intel Xeon server hardware – Infiniband 40Gb/s – Oracle cell (storage) server - Flash to mimic SAN cache - High performance disks or high capacity disks - 600GB 15k RPM / ~ 5ms latency - 2/3TB 7.2k RPM / ~ 8ms latency5
  6. 6. Flash – Flashcards are in every storage server – Total of 384GB per storage server – Do not confuse exadata STORAGE server flashcache with oracle database flashcache – Flash can be configured either as cache (flash cache and flash log) or as diskgroup or both – When flash is used as diskgroup latency is ~ 1 ms - Much faster than disk - My guess was < 400µs - 1µs infiniband - 200µs flash IO time - some time for storage server6
  7. 7. Flash – Flash is restricted to 4x96GB = 384GB per storage server. - Totals: - Q:1152GB, H:2688GB, F:5376GB - Net (ASM Normal redundancy): - Q: 576GB, H:1344GB, F:2688GB – That is a very limited amount of storage. – But with flash as diskgroup there‟s no cache for PIO‟s!7
  8. 8. Exadata specific features – The secret sauce of exadata: the storage server - smartscan - storage indexes - EHCC * - IO Resource manager8
  9. 9. OLTP – How does OLTP look like (in general | simplistic) – Fetch small amount of data - Invoice numbers, client id, product id - select single values or small ranges via index – Create or update rows - Sold items on invoice, payments, order status - insert or update values10
  10. 10. SLOB – A great way to mimic or measure OLTP performance is SLOB – Silly Little Oracle Benchmark – Author: Kevin Closson – http://oaktable.net/articles/slob-silly-little-oracle- benchmark11
  11. 11. SLOB –It can do reading: FOR i IN 1..5000 LOOP v_r := dbms_random.value(257, 10000) ; SELECT COUNT(c2) into x FROM cf1 where custid > v_r - 256 AND custid < v_r; END LOOP;12
  12. 12. SLOB –And writing: FOR i IN 1..500 LOOP v_r := dbms_random.value(257, 10000) ; UPDATE cf1 set c2 = AAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAA BBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBA AAAAAAABBBBBBBB, ....up to column 20 (c20).... where custid > v_r - 256 AND custid < v_r; COMMIT; END LOOP;13
  13. 13. – Lets run SLOB with 0 writers and 1 reader at: – Single instance database, 10G SGA, 9G cache. >> Cold cache << - Exadata V2 / Oracle 11.2.0.2 / HP - Half rack / 7 storage servers / 84 disks (15k rpm) - Exadata X2 / Oracle 11.2.0.2 / HC - Quarter rack / 3 storage servers / 36 disks (7.2k rpm)14
  14. 14. 15
  15. 15. 16
  16. 16. 17
  17. 17. 18
  18. 18. 1 reader results - V2 time: 5 sec - CPU: 84.3% - PIO: 10‟768 (0.8%) -- IO time 0.8 sec - LIO: 1‟299‟493 - X2 time: 4 sec - CPU: 75.7% - PIO: 10‟922 (0.8%) -- IO time 0.9 sec - LIO: 1‟300‟726 - ODA tm: 4 sec - CPU: 55.2% (20 disks 15k rpm) - PIO: 10‟542 (0.8%) -- IO time 2.2 sec - LIO: 1‟297‟50219
  19. 19. 1 reader conclusion - The time spend on PIO is 15% - 45% - Majority of time spend on LIO/CPU time - Because main portion is CPU, fastest CPU “wins” - Actually: fastest CPU, memory bus and memory.20
  20. 20. LIO benchmark – Let‟s do a pre-warmed cache run - Pre-warmed means: no PIO, data already in BC - This means ONLY LIO speed is measured21
  21. 21. LIO benchmark - ODA: 1 sec - X2: 2 sec - V2: 3 sec22
  22. 22. components! Use ‘dmidecode’ to look to the system’s – Reason: LIO essentially means - Reading memory - CPU processing – ODA - Intel Xeon X5675 @ 3.07GHz (2s12c24t) - L1:384kB, L2:1.5MB, L3:12MB - Memory: Type DDR3, speed 1333 Mhz – X2: - Intel Xeon X5670 @ 2.93GHz (2s12c24t) - L1:384kB, L2:1.5MB, L3:12MB - Memory: Type DDR3, speed 1333 Mhz – V2 - Intel Xeon E5540 @ 2.53GHz (2s8c16t) - L1:128kB, L2:1MB, L3:8MB - Memory: Type DDR3, speed 800 Mhz23
  23. 23. LIO benchmark24
  24. 24. LIO benchmark25
  25. 25. LIO benchmark26
  26. 26. LIO benchmark Core difference and slower memory shows when # readers exceeds core count. Same memory speed: CPU speed matters less with more concurrency.27
  27. 27. LIO benchmark28
  28. 28. LIO benchmark Lesser core’s and slower memory make LIO processing increasingly slower with more concurrency For LIO processing ODA (non-Exadata) and Exadata does not matter.29
  29. 29. – Conclusion: - LIO performance is impacted by: - CPU speed - Number of sockets and core‟s - L1/2/3 cache sizes - Memory speed - Exadata does not matter here! - When comparing entirely different systems also consider: - Oracle version - O/S and version (scheduling) - Hyper threading / CPU architecture - NUMA (Exadata/ODA: no NUMA!)30
  30. 30. – But how about physical IO? - Lower the buffercache to 4M - sga_max_size to 1g - cpu_count to 1 - db_cache_size to 1M (results in 4M) - Slob run with 1 reader31
  31. 31. The V2 is the slowest with 106 seconds. The X2 is only a little slower with 76 seconds. Surprise! ODA is the fastest here with 73 seconds.32
  32. 32. Total time (s) CPU time (s) IO time (s) ODA 73 17 60 X2 76 33 55 V2 106 52 52 – IO ODA: - 60/1264355= 0.047 ms – IO X2: - 55/1265602= 0.043 ms – IO V2: - 52/1240941= 0.042 ms33
  33. 33. – This is not random disk IO! - Average latency of random IO 15k rpm disk ~ 5ms - Average latency of random IO 7.2k rpm disk ~ 8ms – So this must come from a cache or is not random disk IO - Exadata has flashcache. - On ODA, data probably very nearby on disk.34
  34. 34. Total time (s) CPU time (s) IO time (s) ODA 73 17 60 X2 76 33 55 V2 106 52 52 - Exadata IO takes (way) more CPU. - Roughly the same time is spend on doing IO‟s.35
  35. 35. Slob / 10 readers36
  36. 36. Now IO responsetime on ODA is way higher than Exadata (3008s) Both Exadata’s perform alike: X2 581s, V2 588s.37
  37. 37. Total time (s) CPU time (s) IO time (s) ODA 3008 600 29428 X2 581 848 5213 V2 588 1388 4866 – IO ODA: - 29428/13879603 = 2.120 ms – IO X2: - 5213/14045812 = 0.371 ms – IO V2: - 4866/14170303 = 0.343 ms38
  38. 38. Slob / 20 readers39
  39. 39. 40
  40. 40. Total time (s) CPU time (s) IO time (s) ODA 4503 1377 88756 X2 721 2069 13010 V2 747 3373 12405 – IO ODA: - 88756/28246604 =0.003142183039 = 3.142 ms – IO X2: - 13010/28789330=0.0004519035351 = 0.452 ms – IO V2: - 12405/28766804=0.0004312262148 = 0.431 ms41
  41. 41. Slob / 30 readers42
  42. 42. ODA (20 x 15krpm HDD) disk capacity is saturated so response time increases with more readers. Flashcache is not saturated, so response time of IO of 10-20-30 readers increases very little.43
  43. 43. Slob / up to 80 readers44
  44. 44. ODA response time more or less increases linear. The V2 response time (with more flashcache!) starts increasing at 70 readers. A bottleneck is showing up! (7x384GB!!) X2 flashcache (3x384GB) is not saturated, so little increase in response time.45
  45. 45. IOPS view instead of responsetime 3x384GB Flashcache and IB can serve > 115’851 read IOPS! This V2 has more flashcache, so decline in read IOPS probably due to something else! ODA maxed out at ~ 11’200 read IOPS46
  46. 46. - V2 top 5 timed events with 80 readers: Event Waits Time(s) (ms) time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- cell single block physical rea 102,345,354 56,614 1 47.1 User I/O latch: cache buffers lru chain 27,187,317 33,471 1 27.8 Other 44.1% latch: cache buffers chains 14,736,819 19,594 1 16.3 Concurrenc DB CPU 13,427 11.2 wait list latch free 932,930 553 1 .5 Other - X2 top 5 timed events with 80 readers: Event Waits Time(s) (ms) time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- cell single block physical rea 102,899,953 68,209 1 87.9 User I/O DB CPU 9,297 12.0 latch: cache buffers lru chain 10,917,303 1,585 0 2.0 Other 2.9% latch: cache buffers chains 2,048,395 698 0 .9 Concurrenc cell list of blocks physical r 368,795 522 1 .7 User I/O47
  47. 47. – On the V2, cache concurrency control throttles throughput – On the X2, this happens only very minimal - V2: CPU: Intel Xeon E5540 @ 2.53GHz (2s8c16t) - X2: CPU: Intel Xeon X5670 @ 2.93GHz (2s12c24t) – V2 Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s -------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- latch: cache buffers chain 14.7M 44.1 37.0 16.3 2.6 .0 .0 latch: cache buffers lru c 27.2M 37.2 42.6 20.0 .2 .0 .0 – X2 latch: cache buffers chain 2048. 91.8 7.5 .6 .0 .0 latch: cache buffers lru c 10.9M 97.4 2.3 .2 .048
  48. 48. 1 LIO 80 LIO 1 PIO 80 PIO ODA 1 11 73 9795 X2 (HC disks) 2 11 76 976 V2 (HP disks) 3 22 106 151852
  49. 49. 1 LIO 80 LIO 1 PIO 80 PIO ODA 1 11 73 9795 X2 2 11 76 976 V2 3 22 106 1518 1 PIO w/o flashcache 80 PIO w/o flashcache ODA 73 9795 X2 167 ? V2 118 509853
  50. 50. - For scalability, OLTP needs buffered IO (LIO) - Flashcache is EXTREMELY important physial IO scalability - Never, ever, let flash be used for something else - Unless you can always keep all your small reads in cache - Flash mimics a SAN/NAS cache - So nothing groundbreaking here, it does what current, normal infra should do too... - The bandwidth needed to deliver the data to the database is provided by Infiniband - 1 Gb ethernet = 120MB/s, 4 Gb fiber = 400MB/s - Infiniband is generally available.54
  51. 51. – How much IOPS can a single cell do? - According to https://blogs.oracle.com/mrbenchmark/entry/inside_the_sun_oracl e_database - A single cell can do 75‟000 IOPS from flash (8kB) - Personal calculation: 60‟000 IOPS with 8kB – Flashcache cache - Caches small reads & writes (8kB and less) mostly - Large multiblock reads are not cached, unless segment property „cell_flash_cache‟ is set to „keep‟.55
  52. 52. – Is Exadata a good idea for OLTP? - From a strictly technical point of view, there is no benefit. – But... – Exadata gives you IORM – Exadata gives you reasonably up to date hardware – Exadata gives a system engineered for performance – Exadata gives you dedicated disks – Exadata gives a validated combination of database, clusterware, operating system, hardware, firmware.56
  53. 53. – Exadata storage servers provide NO redundancy for data - That‟s a function of ASM – Exadata is configured with either - Normal redundancy (mirroring) or - High redundancy (triple mirroring) – to provide data redundancy.57
  54. 54. – Reading has no problem with normal/high redundancy. – During writes, all two or three AU‟s need to be written. – This means when you calculate write throughput, you need to double all physical writes if using normal redundancy.58
  55. 55. – But we got flash! Right? – Yes, you got flash. But it probably doesn‟t do what you think it does:59
  56. 56. – This is on the half rack V2 HP: [oracle@dm01db01 [] stuff]$ dcli -l celladmin -g cell_group cellcli -e "list metriccurrent where name like FL_.*_FIRST" dm01cel01: FL_DISK_FIRST FLASHLOG 316,563 IO requests dm01cel01: FL_FLASH_FIRST FLASHLOG 9,143 IO requests dm01cel02: FL_DISK_FIRST FLASHLOG 305,891 IO requests dm01cel02: FL_FLASH_FIRST FLASHLOG 7,435 IO requests dm01cel03: FL_DISK_FIRST FLASHLOG 307,634 IO requests dm01cel03: FL_FLASH_FIRST FLASHLOG 10,577 IO requests dm01cel04: FL_DISK_FIRST FLASHLOG 299,547 IO requests dm01cel04: FL_FLASH_FIRST FLASHLOG 10,381 IO requests dm01cel05: FL_DISK_FIRST FLASHLOG 311,978 IO requests dm01cel05: FL_FLASH_FIRST FLASHLOG 10,888 IO requests dm01cel06: FL_DISK_FIRST FLASHLOG 315,084 IO requests dm01cel06: FL_FLASH_FIRST FLASHLOG 10,022 IO requests dm01cel07: FL_DISK_FIRST FLASHLOG 323,454 IO requests dm01cel07: FL_FLASH_FIRST FLASHLOG 8,807 IO requests60
  57. 57. –This is on the quarter rack X2 HC: [root@xxxxdb01 ~]# dcli -l root -g cell_group cellcli -e "list metriccurrent where name like FL_.*_FIRST" xxxxcel01: FL_DISK_FIRST FLASHLOG 68,475,141 IO requests xxxxcel01: FL_FLASH_FIRST FLASHLOG 9,109,142 IO requests xxxxcel02: FL_DISK_FIRST FLASHLOG 68,640,951 IO requests xxxxcel02: FL_FLASH_FIRST FLASHLOG 9,229,226 IO requests xxxxcel03: FL_DISK_FIRST FLASHLOG 68,388,238 IO requests xxxxcel03: FL_FLASH_FIRST FLASHLOG 9,072,660 IO requests61
  58. 58. – Please mind these are cumulative numbers! – The half-rack is a POC machine, no heavy usage between POC‟s. – The quarter-rack has had some load, but definately not heavy OLTP. – I can imagine flashlog can prevent long write times if disk IO‟s queue. - A normal configured database on Exadata has online redo in DATA and in RECO diskgroup - Normal redundancy means every log write must be done 4 times62
  59. 59. – Log writer wait times: - V2 min: 16ms (1 writer), max: 41ms (20 writers) - X2 min: 39ms (10 writers), max: 110ms (40 writers) – Database writer wait time is significantly lower63
  60. 60. – Log file write response time on Exadata is not in the same range as reads. – There‟s the flashlog feature, but it does not work as the whitepaper explains – Be careful with heavy writing on Exadata. - There‟s no Exadata specific improvement for writes.64
  61. 61. Thank you for attending! Questions and answers.65
  62. 62. Thanks to • Klaas-Jan Jongsma • VX Company • Martin Bach • Kevin Closson66

×