• Like
  • Save
Exadata and OLTP
Upcoming SlideShare
Loading in...5
×
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,165
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Exadata and OLTPEnkitec Extreme Exadata Expo august 13-14 Dallas Frits Hoogland
  • 2. Who am I? Frits Hoogland – Working with Oracle products since 1996 – Working with VX Company since 2009 Interests – Databases, Operating Systems, Application Servers – Web techniques, TCP/IP, network security – Technical security, performanceTwitter: @fritshoogland Blog: http://fritshoogland.wordpress.com Email: fhoogland@vxcompany.com Oracle ACE Director OakTable member
  • 3. What is exadata – Engineered system specifically for oracle database. – Ability to reach high number of read IOPS and huge bandwidth. – Has it‟s own patchbundles. – Validated versions and patch versions across database, clusterware, o/s and storage, firmware. – Dedicated, private storage for databases. – ASM. – Recent hardware & recent CPU. – No virtualisation.3
  • 4. Exadata versions – Oracle database 64 bit version >= 11 – ASM 64 bit version >= 11 - Exadata communication is layer in skgxp code – Linux OL5 x64 - No UEK kernel used (except X2-8)4
  • 5. Exadata hardware – Intel Xeon server hardware – Infiniband 40Gb/s – Oracle cell (storage) server - Flash to mimic SAN cache - High performance disks or high capacity disks - 600GB 15k RPM / ~ 5ms latency - 2/3TB 7.2k RPM / ~ 8ms latency5
  • 6. Flash – Flashcards are in every storage server – Total of 384GB per storage server – Do not confuse exadata STORAGE server flashcache with oracle database flashcache – Flash can be configured either as cache (flash cache and flash log) or as diskgroup or both – When flash is used as diskgroup latency is ~ 1 ms - Much faster than disk - My guess was < 400µs - 1µs infiniband - 200µs flash IO time - some time for storage server6
  • 7. Flash – Flash is restricted to 4x96GB = 384GB per storage server. - Totals: - Q:1152GB, H:2688GB, F:5376GB - Net (ASM Normal redundancy): - Q: 576GB, H:1344GB, F:2688GB – That is a very limited amount of storage. – But with flash as diskgroup there‟s no cache for PIO‟s!7
  • 8. Exadata specific features – The secret sauce of exadata: the storage server - smartscan - storage indexes - EHCC * - IO Resource manager8
  • 9. OLTP – How does OLTP look like (in general | simplistic) – Fetch small amount of data - Invoice numbers, client id, product id - select single values or small ranges via index – Create or update rows - Sold items on invoice, payments, order status - insert or update values10
  • 10. SLOB – A great way to mimic or measure OLTP performance is SLOB – Silly Little Oracle Benchmark – Author: Kevin Closson – http://oaktable.net/articles/slob-silly-little-oracle- benchmark11
  • 11. SLOB –It can do reading: FOR i IN 1..5000 LOOP v_r := dbms_random.value(257, 10000) ; SELECT COUNT(c2) into x FROM cf1 where custid > v_r - 256 AND custid < v_r; END LOOP;12
  • 12. SLOB –And writing: FOR i IN 1..500 LOOP v_r := dbms_random.value(257, 10000) ; UPDATE cf1 set c2 = AAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAA BBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBAAAAAAAABBBBBBBBA AAAAAAABBBBBBBB, ....up to column 20 (c20).... where custid > v_r - 256 AND custid < v_r; COMMIT; END LOOP;13
  • 13. – Lets run SLOB with 0 writers and 1 reader at: – Single instance database, 10G SGA, 9G cache. >> Cold cache << - Exadata V2 / Oracle 11.2.0.2 / HP - Half rack / 7 storage servers / 84 disks (15k rpm) - Exadata X2 / Oracle 11.2.0.2 / HC - Quarter rack / 3 storage servers / 36 disks (7.2k rpm)14
  • 14. 15
  • 15. 16
  • 16. 17
  • 17. 18
  • 18. 1 reader results - V2 time: 5 sec - CPU: 84.3% - PIO: 10‟768 (0.8%) -- IO time 0.8 sec - LIO: 1‟299‟493 - X2 time: 4 sec - CPU: 75.7% - PIO: 10‟922 (0.8%) -- IO time 0.9 sec - LIO: 1‟300‟726 - ODA tm: 4 sec - CPU: 55.2% (20 disks 15k rpm) - PIO: 10‟542 (0.8%) -- IO time 2.2 sec - LIO: 1‟297‟50219
  • 19. 1 reader conclusion - The time spend on PIO is 15% - 45% - Majority of time spend on LIO/CPU time - Because main portion is CPU, fastest CPU “wins” - Actually: fastest CPU, memory bus and memory.20
  • 20. LIO benchmark – Let‟s do a pre-warmed cache run - Pre-warmed means: no PIO, data already in BC - This means ONLY LIO speed is measured21
  • 21. LIO benchmark - ODA: 1 sec - X2: 2 sec - V2: 3 sec22
  • 22. components! Use ‘dmidecode’ to look to the system’s – Reason: LIO essentially means - Reading memory - CPU processing – ODA - Intel Xeon X5675 @ 3.07GHz (2s12c24t) - L1:384kB, L2:1.5MB, L3:12MB - Memory: Type DDR3, speed 1333 Mhz – X2: - Intel Xeon X5670 @ 2.93GHz (2s12c24t) - L1:384kB, L2:1.5MB, L3:12MB - Memory: Type DDR3, speed 1333 Mhz – V2 - Intel Xeon E5540 @ 2.53GHz (2s8c16t) - L1:128kB, L2:1MB, L3:8MB - Memory: Type DDR3, speed 800 Mhz23
  • 23. LIO benchmark24
  • 24. LIO benchmark25
  • 25. LIO benchmark26
  • 26. LIO benchmark Core difference and slower memory shows when # readers exceeds core count. Same memory speed: CPU speed matters less with more concurrency.27
  • 27. LIO benchmark28
  • 28. LIO benchmark Lesser core’s and slower memory make LIO processing increasingly slower with more concurrency For LIO processing ODA (non-Exadata) and Exadata does not matter.29
  • 29. – Conclusion: - LIO performance is impacted by: - CPU speed - Number of sockets and core‟s - L1/2/3 cache sizes - Memory speed - Exadata does not matter here! - When comparing entirely different systems also consider: - Oracle version - O/S and version (scheduling) - Hyper threading / CPU architecture - NUMA (Exadata/ODA: no NUMA!)30
  • 30. – But how about physical IO? - Lower the buffercache to 4M - sga_max_size to 1g - cpu_count to 1 - db_cache_size to 1M (results in 4M) - Slob run with 1 reader31
  • 31. The V2 is the slowest with 106 seconds. The X2 is only a little slower with 76 seconds. Surprise! ODA is the fastest here with 73 seconds.32
  • 32. Total time (s) CPU time (s) IO time (s) ODA 73 17 60 X2 76 33 55 V2 106 52 52 – IO ODA: - 60/1264355= 0.047 ms – IO X2: - 55/1265602= 0.043 ms – IO V2: - 52/1240941= 0.042 ms33
  • 33. – This is not random disk IO! - Average latency of random IO 15k rpm disk ~ 5ms - Average latency of random IO 7.2k rpm disk ~ 8ms – So this must come from a cache or is not random disk IO - Exadata has flashcache. - On ODA, data probably very nearby on disk.34
  • 34. Total time (s) CPU time (s) IO time (s) ODA 73 17 60 X2 76 33 55 V2 106 52 52 - Exadata IO takes (way) more CPU. - Roughly the same time is spend on doing IO‟s.35
  • 35. Slob / 10 readers36
  • 36. Now IO responsetime on ODA is way higher than Exadata (3008s) Both Exadata’s perform alike: X2 581s, V2 588s.37
  • 37. Total time (s) CPU time (s) IO time (s) ODA 3008 600 29428 X2 581 848 5213 V2 588 1388 4866 – IO ODA: - 29428/13879603 = 2.120 ms – IO X2: - 5213/14045812 = 0.371 ms – IO V2: - 4866/14170303 = 0.343 ms38
  • 38. Slob / 20 readers39
  • 39. 40
  • 40. Total time (s) CPU time (s) IO time (s) ODA 4503 1377 88756 X2 721 2069 13010 V2 747 3373 12405 – IO ODA: - 88756/28246604 =0.003142183039 = 3.142 ms – IO X2: - 13010/28789330=0.0004519035351 = 0.452 ms – IO V2: - 12405/28766804=0.0004312262148 = 0.431 ms41
  • 41. Slob / 30 readers42
  • 42. ODA (20 x 15krpm HDD) disk capacity is saturated so response time increases with more readers. Flashcache is not saturated, so response time of IO of 10-20-30 readers increases very little.43
  • 43. Slob / up to 80 readers44
  • 44. ODA response time more or less increases linear. The V2 response time (with more flashcache!) starts increasing at 70 readers. A bottleneck is showing up! (7x384GB!!) X2 flashcache (3x384GB) is not saturated, so little increase in response time.45
  • 45. IOPS view instead of responsetime 3x384GB Flashcache and IB can serve > 115’851 read IOPS! This V2 has more flashcache, so decline in read IOPS probably due to something else! ODA maxed out at ~ 11’200 read IOPS46
  • 46. - V2 top 5 timed events with 80 readers: Event Waits Time(s) (ms) time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- cell single block physical rea 102,345,354 56,614 1 47.1 User I/O latch: cache buffers lru chain 27,187,317 33,471 1 27.8 Other 44.1% latch: cache buffers chains 14,736,819 19,594 1 16.3 Concurrenc DB CPU 13,427 11.2 wait list latch free 932,930 553 1 .5 Other - X2 top 5 timed events with 80 readers: Event Waits Time(s) (ms) time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- cell single block physical rea 102,899,953 68,209 1 87.9 User I/O DB CPU 9,297 12.0 latch: cache buffers lru chain 10,917,303 1,585 0 2.0 Other 2.9% latch: cache buffers chains 2,048,395 698 0 .9 Concurrenc cell list of blocks physical r 368,795 522 1 .7 User I/O47
  • 47. – On the V2, cache concurrency control throttles throughput – On the X2, this happens only very minimal - V2: CPU: Intel Xeon E5540 @ 2.53GHz (2s8c16t) - X2: CPU: Intel Xeon X5670 @ 2.93GHz (2s12c24t) – V2 Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s -------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- latch: cache buffers chain 14.7M 44.1 37.0 16.3 2.6 .0 .0 latch: cache buffers lru c 27.2M 37.2 42.6 20.0 .2 .0 .0 – X2 latch: cache buffers chain 2048. 91.8 7.5 .6 .0 .0 latch: cache buffers lru c 10.9M 97.4 2.3 .2 .048
  • 48. 1 LIO 80 LIO 1 PIO 80 PIO ODA 1 11 73 9795 X2 (HC disks) 2 11 76 976 V2 (HP disks) 3 22 106 151852
  • 49. 1 LIO 80 LIO 1 PIO 80 PIO ODA 1 11 73 9795 X2 2 11 76 976 V2 3 22 106 1518 1 PIO w/o flashcache 80 PIO w/o flashcache ODA 73 9795 X2 167 ? V2 118 509853
  • 50. - For scalability, OLTP needs buffered IO (LIO) - Flashcache is EXTREMELY important physial IO scalability - Never, ever, let flash be used for something else - Unless you can always keep all your small reads in cache - Flash mimics a SAN/NAS cache - So nothing groundbreaking here, it does what current, normal infra should do too... - The bandwidth needed to deliver the data to the database is provided by Infiniband - 1 Gb ethernet = 120MB/s, 4 Gb fiber = 400MB/s - Infiniband is generally available.54
  • 51. – How much IOPS can a single cell do? - According to https://blogs.oracle.com/mrbenchmark/entry/inside_the_sun_oracl e_database - A single cell can do 75‟000 IOPS from flash (8kB) - Personal calculation: 60‟000 IOPS with 8kB – Flashcache cache - Caches small reads & writes (8kB and less) mostly - Large multiblock reads are not cached, unless segment property „cell_flash_cache‟ is set to „keep‟.55
  • 52. – Is Exadata a good idea for OLTP? - From a strictly technical point of view, there is no benefit. – But... – Exadata gives you IORM – Exadata gives you reasonably up to date hardware – Exadata gives a system engineered for performance – Exadata gives you dedicated disks – Exadata gives a validated combination of database, clusterware, operating system, hardware, firmware.56
  • 53. – Exadata storage servers provide NO redundancy for data - That‟s a function of ASM – Exadata is configured with either - Normal redundancy (mirroring) or - High redundancy (triple mirroring) – to provide data redundancy.57
  • 54. – Reading has no problem with normal/high redundancy. – During writes, all two or three AU‟s need to be written. – This means when you calculate write throughput, you need to double all physical writes if using normal redundancy.58
  • 55. – But we got flash! Right? – Yes, you got flash. But it probably doesn‟t do what you think it does:59
  • 56. – This is on the half rack V2 HP: [oracle@dm01db01 [] stuff]$ dcli -l celladmin -g cell_group cellcli -e "list metriccurrent where name like FL_.*_FIRST" dm01cel01: FL_DISK_FIRST FLASHLOG 316,563 IO requests dm01cel01: FL_FLASH_FIRST FLASHLOG 9,143 IO requests dm01cel02: FL_DISK_FIRST FLASHLOG 305,891 IO requests dm01cel02: FL_FLASH_FIRST FLASHLOG 7,435 IO requests dm01cel03: FL_DISK_FIRST FLASHLOG 307,634 IO requests dm01cel03: FL_FLASH_FIRST FLASHLOG 10,577 IO requests dm01cel04: FL_DISK_FIRST FLASHLOG 299,547 IO requests dm01cel04: FL_FLASH_FIRST FLASHLOG 10,381 IO requests dm01cel05: FL_DISK_FIRST FLASHLOG 311,978 IO requests dm01cel05: FL_FLASH_FIRST FLASHLOG 10,888 IO requests dm01cel06: FL_DISK_FIRST FLASHLOG 315,084 IO requests dm01cel06: FL_FLASH_FIRST FLASHLOG 10,022 IO requests dm01cel07: FL_DISK_FIRST FLASHLOG 323,454 IO requests dm01cel07: FL_FLASH_FIRST FLASHLOG 8,807 IO requests60
  • 57. –This is on the quarter rack X2 HC: [root@xxxxdb01 ~]# dcli -l root -g cell_group cellcli -e "list metriccurrent where name like FL_.*_FIRST" xxxxcel01: FL_DISK_FIRST FLASHLOG 68,475,141 IO requests xxxxcel01: FL_FLASH_FIRST FLASHLOG 9,109,142 IO requests xxxxcel02: FL_DISK_FIRST FLASHLOG 68,640,951 IO requests xxxxcel02: FL_FLASH_FIRST FLASHLOG 9,229,226 IO requests xxxxcel03: FL_DISK_FIRST FLASHLOG 68,388,238 IO requests xxxxcel03: FL_FLASH_FIRST FLASHLOG 9,072,660 IO requests61
  • 58. – Please mind these are cumulative numbers! – The half-rack is a POC machine, no heavy usage between POC‟s. – The quarter-rack has had some load, but definately not heavy OLTP. – I can imagine flashlog can prevent long write times if disk IO‟s queue. - A normal configured database on Exadata has online redo in DATA and in RECO diskgroup - Normal redundancy means every log write must be done 4 times62
  • 59. – Log writer wait times: - V2 min: 16ms (1 writer), max: 41ms (20 writers) - X2 min: 39ms (10 writers), max: 110ms (40 writers) – Database writer wait time is significantly lower63
  • 60. – Log file write response time on Exadata is not in the same range as reads. – There‟s the flashlog feature, but it does not work as the whitepaper explains – Be careful with heavy writing on Exadata. - There‟s no Exadata specific improvement for writes.64
  • 61. Thank you for attending! Questions and answers.65
  • 62. Thanks to • Klaas-Jan Jongsma • VX Company • Martin Bach • Kevin Closson66