RAC+ASM 3 years in production Stories to share<br />Presented by: Christo Kutrovsky<br />
Who Am I<br />2<br /><ul><li>Oracle ACE
10 years in Oracle field
Joined Pythian 2003
Part of Pythian Consulting Team
Special projects
Performance tuning
Critical services</li></ul>“oracle pinup”<br />
Pythian Facts<br />Founded in 1997<br />90 employees<br />120 customers worldwide<br />20 customers more than $1 billion i...
What Pythian does<br />Pythian provides database and application infrastructure services.<br />
Agenda<br />5<br /><ul><li>2 nodes RAC
ASMLIB with multipathing
Migrating to new servers with ASM
Thin provisioning
ASM + restores = danger
Device naming conventions
spfile location
JBOD configuration</li></li></ul><li>6<br />2 Node RAC for High Availability<br />
2 Node RACs for HA<br />7<br /><ul><li>Two node RAC nodes
13 databases
Dev databases
Shutdown databases (and ASM) on node1
Perform maintenance
Unplug the interconnect cable
What happens?</li></li></ul><li>2 nodes RAC<br />8<br />VIP<br />VIP<br />Node 1<br />Node 2<br />SID_A1<br />SID_A2<br />...
2 nodes RAC<br />9<br />VIP<br />VIP<br />Node 1<br />Node 2<br />SID_A1<br />SID_A2<br />SID_B1<br />SID_B2<br />Intercon...
10<br />I can’t<br />See Node 1<br />I can’t<br />See Node 2<br />VIP<br />VIP<br />Node 1<br />Node 2<br />SID_A1<br />SI...
One is not Quorum<br />11<br /><ul><li>50% chance your working node gets restarted
Depends on clusterware version
Who will shoot the other guy first</li></li></ul><li>One is not Quorum<br />12<br /><ul><li>Conclusion?
Turn off clusterware when you have only 2 nodes and performing maintenance
Upgrade to a more predicable clusterware
Lowest ‘leader’ always survives
Add a 3th tie-breaker node
doesn’t have to run a database, just clusterware (observer)</li></li></ul><li>One is not Quorum<br />13<br />Production ca...
All disk dies on one node?</li></li></ul><li>14<br />ASMLIBwith Multi Pathing<br />
Building ASMLIB devices when multipathing is present<br />15<br /><ul><li>Devices used for creating asmlib
/dev/emcpowerc1
/dev/mapper/raid10_data_disk
Devices used to create asmdiskgroup
ASMLIB
The reboot changes everything
ASMLIB re-discovers the devices without multipath
Difficult to diagnose</li></li></ul><li>Visual<br />16<br />/dev/mapper/data1<br />/dev/mapper/data2<br />/dev/sdb<br />/d...
Building ASMLIB devices when multipathing is present<br />17<br /><ul><li>Do not use ASMLIB
If you have to (why?)
Must setup “ORACLEASM_SCANORDER”
asm_diskstring parameter
Permissions
Udev files
Boot/startup script</li></li></ul><li>Removing ASMLIB<br />18<br /><ul><li>Why
Extra layer
Requires new driver for every new kernel
Can cause downtime if not careful
ASMLIB header is the same as ASM DISK header
Just has extra field for ASMLIB name
Disks can be accessed directly, without ASMLIB without having to drop/recreate them</li></li></ul><li>Removing ASMLIB<br /...
Change or set asm_diskstring
Remount diskgroups via new paths
Can be done in rolling fashion in RAC</li></li></ul><li>20<br />SAN Migration<br />
Upcoming SlideShare
Loading in...5
×

RAC+ASM: Stories to Share

9,091

Published on

RAC+ASM: Lessons learned after 2 years in production

Managing over 70 databases for 4 major customers, I have some good stories to share. Running almost all possible combinations of ASM, RAC, NETAPP and NFS.

Success, failure and gotchas. This presentation is the equivalent of years of experience, condensed in major highlights in 45 minutes. To list a few stories:

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
9,091
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
587
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "RAC+ASM: Stories to Share"

  1. 1. RAC+ASM 3 years in production Stories to share<br />Presented by: Christo Kutrovsky<br />
  2. 2. Who Am I<br />2<br /><ul><li>Oracle ACE
  3. 3. 10 years in Oracle field
  4. 4. Joined Pythian 2003
  5. 5. Part of Pythian Consulting Team
  6. 6. Special projects
  7. 7. Performance tuning
  8. 8. Critical services</li></ul>“oracle pinup”<br />
  9. 9. Pythian Facts<br />Founded in 1997<br />90 employees<br />120 customers worldwide<br />20 customers more than $1 billion in revenue<br />5 offices in 5 countries<br />10 years profitable private company<br />
  10. 10. What Pythian does<br />Pythian provides database and application infrastructure services.<br />
  11. 11. Agenda<br />5<br /><ul><li>2 nodes RAC
  12. 12. ASMLIB with multipathing
  13. 13. Migrating to new servers with ASM
  14. 14. Thin provisioning
  15. 15. ASM + restores = danger
  16. 16. Device naming conventions
  17. 17. spfile location
  18. 18. JBOD configuration</li></li></ul><li>6<br />2 Node RAC for High Availability<br />
  19. 19. 2 Node RACs for HA<br />7<br /><ul><li>Two node RAC nodes
  20. 20. 13 databases
  21. 21. Dev databases
  22. 22. Shutdown databases (and ASM) on node1
  23. 23. Perform maintenance
  24. 24. Unplug the interconnect cable
  25. 25. What happens?</li></li></ul><li>2 nodes RAC<br />8<br />VIP<br />VIP<br />Node 1<br />Node 2<br />SID_A1<br />SID_A2<br />SID_B1<br />SID_B2<br />Interconnect<br />ASM<br />DG<br />Fibre Channel<br />OCR/V<br />
  26. 26. 2 nodes RAC<br />9<br />VIP<br />VIP<br />Node 1<br />Node 2<br />SID_A1<br />SID_A2<br />SID_B1<br />SID_B2<br />Interconnect<br />ASM<br />DG<br />Fibre Channel<br />OCR/V<br />
  27. 27. 10<br />I can’t<br />See Node 1<br />I can’t<br />See Node 2<br />VIP<br />VIP<br />Node 1<br />Node 2<br />SID_A1<br />SID_A2<br />SID_B1<br />SID_B2<br />Interconnect<br />ASM<br />DG<br />Fibre Channel<br />OCR/V<br />
  28. 28. One is not Quorum<br />11<br /><ul><li>50% chance your working node gets restarted
  29. 29. Depends on clusterware version
  30. 30. Who will shoot the other guy first</li></li></ul><li>One is not Quorum<br />12<br /><ul><li>Conclusion?
  31. 31. Turn off clusterware when you have only 2 nodes and performing maintenance
  32. 32. Upgrade to a more predicable clusterware
  33. 33. Lowest ‘leader’ always survives
  34. 34. Add a 3th tie-breaker node
  35. 35. doesn’t have to run a database, just clusterware (observer)</li></li></ul><li>One is not Quorum<br />13<br />Production cases, what happens if<br /><ul><li>All Network dies on one node?
  36. 36. All disk dies on one node?</li></li></ul><li>14<br />ASMLIBwith Multi Pathing<br />
  37. 37. Building ASMLIB devices when multipathing is present<br />15<br /><ul><li>Devices used for creating asmlib
  38. 38. /dev/emcpowerc1
  39. 39. /dev/mapper/raid10_data_disk
  40. 40. Devices used to create asmdiskgroup
  41. 41. ASMLIB
  42. 42. The reboot changes everything
  43. 43. ASMLIB re-discovers the devices without multipath
  44. 44. Difficult to diagnose</li></li></ul><li>Visual<br />16<br />/dev/mapper/data1<br />/dev/mapper/data2<br />/dev/sdb<br />/dev/sdc<br />/dev/sdd<br />/dev/sde<br />HBA1<br />HBA2<br />LUN_1<br />LUN_2<br />
  45. 45. Building ASMLIB devices when multipathing is present<br />17<br /><ul><li>Do not use ASMLIB
  46. 46. If you have to (why?)
  47. 47. Must setup “ORACLEASM_SCANORDER”
  48. 48. asm_diskstring parameter
  49. 49. Permissions
  50. 50. Udev files
  51. 51. Boot/startup script</li></li></ul><li>Removing ASMLIB<br />18<br /><ul><li>Why
  52. 52. Extra layer
  53. 53. Requires new driver for every new kernel
  54. 54. Can cause downtime if not careful
  55. 55. ASMLIB header is the same as ASM DISK header
  56. 56. Just has extra field for ASMLIB name
  57. 57. Disks can be accessed directly, without ASMLIB without having to drop/recreate them</li></li></ul><li>Removing ASMLIB<br />19<br /><ul><li>Unmount all affected diskgroups
  58. 58. Change or set asm_diskstring
  59. 59. Remount diskgroups via new paths
  60. 60. Can be done in rolling fashion in RAC</li></li></ul><li>20<br />SAN Migration<br />
  61. 61. Migrating from EMC to 3PAR<br />21<br /><ul><li>New SAN
  62. 62. New concept
  63. 63. Thin provisioning
  64. 64. A big project
  65. 65. Or not</li></li></ul><li>Add/drop/go home<br />22<br /><ul><li>No brainer
  66. 66. Thin provisioning rocks
  67. 67. SA adds disks
  68. 68. Add disk to diskgroup
  69. 69. Drop all old disks
  70. 70. Wait
  71. 71. Never be paged on space</li></li></ul><li>23<br />Server Migration<br />
  72. 72. Server migration<br />24<br /><ul><li>Current setup
  73. 73. 2 nodes RAC with ASM
  74. 74. New servers
  75. 75. Better, Faster, Stronger
  76. 76. Fastest (effort wise) way to migrate, with minimal downtime
  77. 77. Possible with zero downtime</li></li></ul><li>Server migration options<br />25<br /><ul><li>Create standby on new server
  78. 78. Requires extra copy of data
  79. 79. Add the new nodes, drop existing ones
  80. 80. Possible clusterware issues
  81. 81. Move the LUNs
  82. 82. Easy
  83. 83. New servers tested</li></li></ul><li>Lun Migration<br />26<br /><ul><li>Install clusterware and create RAC database with same name
  84. 84. Test hardware / wiring / configuration
  85. 85. Migrate
  86. 86. Stop production
  87. 87. Re-assigning LUNs
  88. 88. Start production</li></li></ul><li>27<br />ASM Restore creates database black hole<br />
  89. 89. ASM + Same host restore = DANGER<br />28<br /><ul><li>Production database
  90. 90. Diskgroup +PROD
  91. 91. Snapshot database
  92. 92. Diskgroup +SNAP
  93. 93. Rebuild monthly via duplicate database
  94. 94. Except this one time…</li></li></ul><li>The concept<br />29<br /><ul><li>“SNAP” backups not taken
  95. 95. If a given “SNAP” backup is to be restored, simply re-create the given “PROD” backup
  96. 96. Independent from Production</li></li></ul><li>Restore with ASM<br />30<br /><ul><li>Restore FRA files into separate directory
  97. 97. Startup SNAP instance
  98. 98. Catalog backup files
  99. 99. Restore into SNAP diskgroup
  100. 100. The missing piece?“restore” writes into original backup file location
  101. 101. Must use “set new name for datafile” in run block</li></li></ul><li>Restore with ASM – the result<br />31<br /><ul><li>Unrecoverable corruption on production database
  102. 102. Lost about 3-4 hours of changes
  103. 103. If this was filesystem and not ASM, no corruption would have occured</li></li></ul><li>Corruption – what happened<br />32<br />SGA<br />5<br />rows<br />5<br />rows<br />2<br />rows<br />BLK1 add Row 6<br />BLK3 add Row 3<br />5<br />rows<br />5<br />rows<br />2<br />rows<br />2<br />rows<br />Disk<br />Partially overwritten datafile<br />5<br />rows<br />5<br />rows<br />5 rows<br />5 rows<br />REDO<br />Disk<br />Original datafile<br />
  104. 104. Corruption – what should’ve hap.<br />33<br />SGA<br />5<br />rows<br />5<br />rows<br />5<br />rows<br />BLK1 add Row 6<br />BLK3 add Row 3<br />5<br />rows<br />5<br />rows<br />2<br />rows<br />2<br />rows<br />BLK3 add Row 6<br />Disk<br />Partially overwritten datafile<br />5<br />rows<br />5<br />rows<br />5 rows<br />5 rows<br />REDO<br />Disk<br />Original datafile<br />
  105. 105. Corruption – what happened<br />34<br />
  106. 106. Corruption<br />35<br /><ul><li>Why this wouldn’t have happened with filesystem?
  107. 107. File names are just pointers to data stream
  108. 108. If a file is re-created, a new data streams is associated with it
  109. 109. Processes that have the file currently open still use the old data stream
  110. 110. This is why “undelete” is possible
  111. 111. My blog about undeleting files</li></li></ul><li>Corruption<br />Open “file 1”<br />36<br />Process 1<br />File 1<br />Stream X1<br />
  112. 112. Corruption – recreate File 1<br />Open “file 1”<br />37<br />Process 1<br />File 1<br />Stream X1<br />Stream X2<br />
  113. 113. 38<br />Device names convention causes user error<br />
  114. 114. Device naming conventions<br />39<br /><ul><li>Using /dev/mapper/<name>
  115. 115. Asm uses <name>p1 – first partition
  116. 116. Permissions set script uses: “*p1”
  117. 117. Then came /dev/mapper/backup1
  118. 118. First partition is: /dev/mapper/backup1p1</li></li></ul><li>Device naming conventions<br />40<br /><ul><li>V$ASM_DISK</li></ul>PATH HEADER_STATUS<br />--------------------------- -------------<br />/dev/mapper/backup1 CANDIDATE<br />/dev/mapper/redop1 MEMBER<br />/dev/mapper/backup1p1 MEMBER<br />/dev/mapper/data2p1 MEMBER<br />/dev/mapper/data1p1 MEMBER<br />
  119. 119. Naming conventions<br />41<br />ADDED<br />DISK<br />Partition 1<br />IN USE<br />
  120. 120. New convention<br />42<br /><ul><li>Now we use generic names, as we do re-assign disks
  121. 121. We also use prefix and suffix with a clear dilimiter</li></ul>/dev/mapper/asm-raid5-dev01-part1<br />
  122. 122. 43<br />spfile location in RAC<br />
  123. 123. spfile location<br />44<br /><ul><li>Intended configuration
  124. 124. init.oraspfile=‘+ASM_DSKGRP/dbname.spfile’
  125. 125. no spfile</li></li></ul><li>Changing parameters in masses<br />45<br /><ul><li>create pfile=‘your_initials.ora’ from spfile;
  126. 126. edit
  127. 127. create spfile=‘+ASM_DSK/spfile’ from pfile=‘ck.ora’</li></li></ul><li>What not to do<br />46<br /><ul><li>create pfile from spfile
  128. 128. edit
  129. 129. create spfile from pfile;</li></li></ul><li>Result<br />47<br /><ul><li>One node uses local spfile
  130. 130. Other(s) uses global spfile
  131. 131. Parameter changes to “BAD” node are sent to other nodes
  132. 132. not persistent on GOOD nodes
  133. 133. persistent on BAD nodes
  134. 134. Paramer changes on GOOD nodes have reversed behaviour</li></li></ul><li>48<br />Adding ASM disks crashes databases<br />
  135. 135. Adding disks<br />49<br /><ul><li>Must be visible on all servers
  136. 136. Otherwise your diskgroup gets dismounted on the nodes that don’t see the disk
  137. 137. All databases using this diskgroup crash</li></li></ul><li>ASM add disk process<br />50<br />Is the disk visible locally?<br />Initialize disk header, add it to diskgroup<br />Notify all nodes to rescan disks and add the new disk<br />If one or more nodes cannot see the disk, raise error<br />Dismount diskgroup on all nodes not seeing the new disk<br />
  138. 138. 51<br />ASM with JBODwelcomes simplicity<br />
  139. 139. JBOD Configuration<br />52<br /><ul><li>Linux Datawarehouse
  140. 140. 10 TB space
  141. 141. 28 disks of 430/285 GB
  142. 142. All redundancy/striping provided by ASM</li></li></ul><li>JBOD Configuration<br />53<br /><ul><li>Simplicity
  143. 143. No ASMLIB
  144. 144. Straight devices
  145. 145. Naming convention – use only 1 partition, and use partition 4
  146. 146. /dev/sd*4
  147. 147. is ASM partition
  148. 148. is permissions wildcard
  149. 149. is asm_diskstring</li></li></ul><li>Testing your speed<br />54<br /><ul><li>Verify read speed of each device
  150. 150. Verifies each device is performing as expected
  151. 151. Verify read speed from all devices
  152. 152. Verify your total bandwith
  153. 153. Verify read speed from all devices, towards the end of the device
  154. 154. Disk read speed is not linear</li></li></ul><li>Read Speed of a single disk<br />55<br />* Courtesy google image search<br />
  155. 155. Testing your speed<br />56<br /><ul><li>One device at a time</li></ul>for dsk in /dev/sd[c-q]; do echo $dsk; dd if=$dsk of=/dev/null iflag=direct bs=2M count=100; done<br /><ul><li>All devices (total bandwith)</li></ul>for dsk in /dev/sd[c-q]; do echo $dsk; dd if=$dsk of=/dev/null iflag=direct bs=2M count=100 &; done<br /><ul><li>Test end speed
  156. 156. Add SKIP=x</li></li></ul><li>Sample output<br />57<br />/dev/sdc100+0 records in100+0 records out209715200 bytes (210 MB) copied, 1.60325 seconds, 131 MB/s/dev/sdd100+0 records in100+0 records out209715200 bytes (210 MB) copied, 1.60188 seconds, 131 MB/s/dev/sde100+0 records in100+0 records out209715200 bytes (210 MB) copied, 1.60067 seconds, 131 MB/s/dev/sdf100+0 records in100+0 records out209715200 bytes (210 MB) copied, 1.59928 seconds, 131 MB/s/dev/sdg100+0 records in100+0 records out<br />
  157. 157. JBOD configuration<br />58<br /><ul><li>Disk adding/removal is very easy
  158. 158. Add disks in bulk:alter diskgroup XXX add disk ‘/dev/sd[c-q]4’;
  159. 159. Performance rocks
  160. 160. controller speed
  161. 161. Diagnostic is easy
  162. 162. Iostat –x 5 /dev/sd*4
  163. 163. Manageability is easy
  164. 164. 1 diskgroup – no filenames, no mountpoints</li></li></ul><li>Final Thoughts<br />59<br /><ul><li>RAC for HA requires 3 nodes
  165. 165. ASM
  166. 166. Keep it simple
  167. 167. Reduce layers
  168. 168. Runs fast
  169. 169. Still need to be carefull</li></li></ul><li>60<br />The End<br />Thank You<br />Questions?<br />I blog at<br />http://www.pythian.com/news/author/kutrovsky/<br />
  170. 170. 61<br />Transition/Section Marker<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×