Your SlideShare is downloading. ×
0
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
RAC+ASM: Stories to Share
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

RAC+ASM: Stories to Share

9,052

Published on

RAC+ASM: Lessons learned after 2 years in production …

RAC+ASM: Lessons learned after 2 years in production

Managing over 70 databases for 4 major customers, I have some good stories to share. Running almost all possible combinations of ASM, RAC, NETAPP and NFS.

Success, failure and gotchas. This presentation is the equivalent of years of experience, condensed in major highlights in 45 minutes. To list a few stories:

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
9,052
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
585
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. RAC+ASM 3 years in production Stories to share<br />Presented by: Christo Kutrovsky<br />
  • 2. Who Am I<br />2<br /><ul><li>Oracle ACE
  • 3. 10 years in Oracle field
  • 4. Joined Pythian 2003
  • 5. Part of Pythian Consulting Team
  • 6. Special projects
  • 7. Performance tuning
  • 8. Critical services</li></ul>“oracle pinup”<br />
  • 9. Pythian Facts<br />Founded in 1997<br />90 employees<br />120 customers worldwide<br />20 customers more than $1 billion in revenue<br />5 offices in 5 countries<br />10 years profitable private company<br />
  • 10. What Pythian does<br />Pythian provides database and application infrastructure services.<br />
  • 11. Agenda<br />5<br /><ul><li>2 nodes RAC
  • 12. ASMLIB with multipathing
  • 13. Migrating to new servers with ASM
  • 14. Thin provisioning
  • 15. ASM + restores = danger
  • 16. Device naming conventions
  • 17. spfile location
  • 18. JBOD configuration</li></li></ul><li>6<br />2 Node RAC for High Availability<br />
  • 19. 2 Node RACs for HA<br />7<br /><ul><li>Two node RAC nodes
  • 20. 13 databases
  • 21. Dev databases
  • 22. Shutdown databases (and ASM) on node1
  • 23. Perform maintenance
  • 24. Unplug the interconnect cable
  • 25. What happens?</li></li></ul><li>2 nodes RAC<br />8<br />VIP<br />VIP<br />Node 1<br />Node 2<br />SID_A1<br />SID_A2<br />SID_B1<br />SID_B2<br />Interconnect<br />ASM<br />DG<br />Fibre Channel<br />OCR/V<br />
  • 26. 2 nodes RAC<br />9<br />VIP<br />VIP<br />Node 1<br />Node 2<br />SID_A1<br />SID_A2<br />SID_B1<br />SID_B2<br />Interconnect<br />ASM<br />DG<br />Fibre Channel<br />OCR/V<br />
  • 27. 10<br />I can’t<br />See Node 1<br />I can’t<br />See Node 2<br />VIP<br />VIP<br />Node 1<br />Node 2<br />SID_A1<br />SID_A2<br />SID_B1<br />SID_B2<br />Interconnect<br />ASM<br />DG<br />Fibre Channel<br />OCR/V<br />
  • 28. One is not Quorum<br />11<br /><ul><li>50% chance your working node gets restarted
  • 29. Depends on clusterware version
  • 30. Who will shoot the other guy first</li></li></ul><li>One is not Quorum<br />12<br /><ul><li>Conclusion?
  • 31. Turn off clusterware when you have only 2 nodes and performing maintenance
  • 32. Upgrade to a more predicable clusterware
  • 33. Lowest ‘leader’ always survives
  • 34. Add a 3th tie-breaker node
  • 35. doesn’t have to run a database, just clusterware (observer)</li></li></ul><li>One is not Quorum<br />13<br />Production cases, what happens if<br /><ul><li>All Network dies on one node?
  • 36. All disk dies on one node?</li></li></ul><li>14<br />ASMLIBwith Multi Pathing<br />
  • 37. Building ASMLIB devices when multipathing is present<br />15<br /><ul><li>Devices used for creating asmlib
  • 38. /dev/emcpowerc1
  • 39. /dev/mapper/raid10_data_disk
  • 40. Devices used to create asmdiskgroup
  • 41. ASMLIB
  • 42. The reboot changes everything
  • 43. ASMLIB re-discovers the devices without multipath
  • 44. Difficult to diagnose</li></li></ul><li>Visual<br />16<br />/dev/mapper/data1<br />/dev/mapper/data2<br />/dev/sdb<br />/dev/sdc<br />/dev/sdd<br />/dev/sde<br />HBA1<br />HBA2<br />LUN_1<br />LUN_2<br />
  • 45. Building ASMLIB devices when multipathing is present<br />17<br /><ul><li>Do not use ASMLIB
  • 46. If you have to (why?)
  • 47. Must setup “ORACLEASM_SCANORDER”
  • 48. asm_diskstring parameter
  • 49. Permissions
  • 50. Udev files
  • 51. Boot/startup script</li></li></ul><li>Removing ASMLIB<br />18<br /><ul><li>Why
  • 52. Extra layer
  • 53. Requires new driver for every new kernel
  • 54. Can cause downtime if not careful
  • 55. ASMLIB header is the same as ASM DISK header
  • 56. Just has extra field for ASMLIB name
  • 57. Disks can be accessed directly, without ASMLIB without having to drop/recreate them</li></li></ul><li>Removing ASMLIB<br />19<br /><ul><li>Unmount all affected diskgroups
  • 58. Change or set asm_diskstring
  • 59. Remount diskgroups via new paths
  • 60. Can be done in rolling fashion in RAC</li></li></ul><li>20<br />SAN Migration<br />
  • 61. Migrating from EMC to 3PAR<br />21<br /><ul><li>New SAN
  • 62. New concept
  • 63. Thin provisioning
  • 64. A big project
  • 65. Or not</li></li></ul><li>Add/drop/go home<br />22<br /><ul><li>No brainer
  • 66. Thin provisioning rocks
  • 67. SA adds disks
  • 68. Add disk to diskgroup
  • 69. Drop all old disks
  • 70. Wait
  • 71. Never be paged on space</li></li></ul><li>23<br />Server Migration<br />
  • 72. Server migration<br />24<br /><ul><li>Current setup
  • 73. 2 nodes RAC with ASM
  • 74. New servers
  • 75. Better, Faster, Stronger
  • 76. Fastest (effort wise) way to migrate, with minimal downtime
  • 77. Possible with zero downtime</li></li></ul><li>Server migration options<br />25<br /><ul><li>Create standby on new server
  • 78. Requires extra copy of data
  • 79. Add the new nodes, drop existing ones
  • 80. Possible clusterware issues
  • 81. Move the LUNs
  • 82. Easy
  • 83. New servers tested</li></li></ul><li>Lun Migration<br />26<br /><ul><li>Install clusterware and create RAC database with same name
  • 84. Test hardware / wiring / configuration
  • 85. Migrate
  • 86. Stop production
  • 87. Re-assigning LUNs
  • 88. Start production</li></li></ul><li>27<br />ASM Restore creates database black hole<br />
  • 89. ASM + Same host restore = DANGER<br />28<br /><ul><li>Production database
  • 90. Diskgroup +PROD
  • 91. Snapshot database
  • 92. Diskgroup +SNAP
  • 93. Rebuild monthly via duplicate database
  • 94. Except this one time…</li></li></ul><li>The concept<br />29<br /><ul><li>“SNAP” backups not taken
  • 95. If a given “SNAP” backup is to be restored, simply re-create the given “PROD” backup
  • 96. Independent from Production</li></li></ul><li>Restore with ASM<br />30<br /><ul><li>Restore FRA files into separate directory
  • 97. Startup SNAP instance
  • 98. Catalog backup files
  • 99. Restore into SNAP diskgroup
  • 100. The missing piece?“restore” writes into original backup file location
  • 101. Must use “set new name for datafile” in run block</li></li></ul><li>Restore with ASM – the result<br />31<br /><ul><li>Unrecoverable corruption on production database
  • 102. Lost about 3-4 hours of changes
  • 103. If this was filesystem and not ASM, no corruption would have occured</li></li></ul><li>Corruption – what happened<br />32<br />SGA<br />5<br />rows<br />5<br />rows<br />2<br />rows<br />BLK1 add Row 6<br />BLK3 add Row 3<br />5<br />rows<br />5<br />rows<br />2<br />rows<br />2<br />rows<br />Disk<br />Partially overwritten datafile<br />5<br />rows<br />5<br />rows<br />5 rows<br />5 rows<br />REDO<br />Disk<br />Original datafile<br />
  • 104. Corruption – what should’ve hap.<br />33<br />SGA<br />5<br />rows<br />5<br />rows<br />5<br />rows<br />BLK1 add Row 6<br />BLK3 add Row 3<br />5<br />rows<br />5<br />rows<br />2<br />rows<br />2<br />rows<br />BLK3 add Row 6<br />Disk<br />Partially overwritten datafile<br />5<br />rows<br />5<br />rows<br />5 rows<br />5 rows<br />REDO<br />Disk<br />Original datafile<br />
  • 105. Corruption – what happened<br />34<br />
  • 106. Corruption<br />35<br /><ul><li>Why this wouldn’t have happened with filesystem?
  • 107. File names are just pointers to data stream
  • 108. If a file is re-created, a new data streams is associated with it
  • 109. Processes that have the file currently open still use the old data stream
  • 110. This is why “undelete” is possible
  • 111. My blog about undeleting files</li></li></ul><li>Corruption<br />Open “file 1”<br />36<br />Process 1<br />File 1<br />Stream X1<br />
  • 112. Corruption – recreate File 1<br />Open “file 1”<br />37<br />Process 1<br />File 1<br />Stream X1<br />Stream X2<br />
  • 113. 38<br />Device names convention causes user error<br />
  • 114. Device naming conventions<br />39<br /><ul><li>Using /dev/mapper/<name>
  • 115. Asm uses <name>p1 – first partition
  • 116. Permissions set script uses: “*p1”
  • 117. Then came /dev/mapper/backup1
  • 118. First partition is: /dev/mapper/backup1p1</li></li></ul><li>Device naming conventions<br />40<br /><ul><li>V$ASM_DISK</li></ul>PATH HEADER_STATUS<br />--------------------------- -------------<br />/dev/mapper/backup1 CANDIDATE<br />/dev/mapper/redop1 MEMBER<br />/dev/mapper/backup1p1 MEMBER<br />/dev/mapper/data2p1 MEMBER<br />/dev/mapper/data1p1 MEMBER<br />
  • 119. Naming conventions<br />41<br />ADDED<br />DISK<br />Partition 1<br />IN USE<br />
  • 120. New convention<br />42<br /><ul><li>Now we use generic names, as we do re-assign disks
  • 121. We also use prefix and suffix with a clear dilimiter</li></ul>/dev/mapper/asm-raid5-dev01-part1<br />
  • 122. 43<br />spfile location in RAC<br />
  • 123. spfile location<br />44<br /><ul><li>Intended configuration
  • 124. init.oraspfile=‘+ASM_DSKGRP/dbname.spfile’
  • 125. no spfile</li></li></ul><li>Changing parameters in masses<br />45<br /><ul><li>create pfile=‘your_initials.ora’ from spfile;
  • 126. edit
  • 127. create spfile=‘+ASM_DSK/spfile’ from pfile=‘ck.ora’</li></li></ul><li>What not to do<br />46<br /><ul><li>create pfile from spfile
  • 128. edit
  • 129. create spfile from pfile;</li></li></ul><li>Result<br />47<br /><ul><li>One node uses local spfile
  • 130. Other(s) uses global spfile
  • 131. Parameter changes to “BAD” node are sent to other nodes
  • 132. not persistent on GOOD nodes
  • 133. persistent on BAD nodes
  • 134. Paramer changes on GOOD nodes have reversed behaviour</li></li></ul><li>48<br />Adding ASM disks crashes databases<br />
  • 135. Adding disks<br />49<br /><ul><li>Must be visible on all servers
  • 136. Otherwise your diskgroup gets dismounted on the nodes that don’t see the disk
  • 137. All databases using this diskgroup crash</li></li></ul><li>ASM add disk process<br />50<br />Is the disk visible locally?<br />Initialize disk header, add it to diskgroup<br />Notify all nodes to rescan disks and add the new disk<br />If one or more nodes cannot see the disk, raise error<br />Dismount diskgroup on all nodes not seeing the new disk<br />
  • 138. 51<br />ASM with JBODwelcomes simplicity<br />
  • 139. JBOD Configuration<br />52<br /><ul><li>Linux Datawarehouse
  • 140. 10 TB space
  • 141. 28 disks of 430/285 GB
  • 142. All redundancy/striping provided by ASM</li></li></ul><li>JBOD Configuration<br />53<br /><ul><li>Simplicity
  • 143. No ASMLIB
  • 144. Straight devices
  • 145. Naming convention – use only 1 partition, and use partition 4
  • 146. /dev/sd*4
  • 147. is ASM partition
  • 148. is permissions wildcard
  • 149. is asm_diskstring</li></li></ul><li>Testing your speed<br />54<br /><ul><li>Verify read speed of each device
  • 150. Verifies each device is performing as expected
  • 151. Verify read speed from all devices
  • 152. Verify your total bandwith
  • 153. Verify read speed from all devices, towards the end of the device
  • 154. Disk read speed is not linear</li></li></ul><li>Read Speed of a single disk<br />55<br />* Courtesy google image search<br />
  • 155. Testing your speed<br />56<br /><ul><li>One device at a time</li></ul>for dsk in /dev/sd[c-q]; do echo $dsk; dd if=$dsk of=/dev/null iflag=direct bs=2M count=100; done<br /><ul><li>All devices (total bandwith)</li></ul>for dsk in /dev/sd[c-q]; do echo $dsk; dd if=$dsk of=/dev/null iflag=direct bs=2M count=100 &; done<br /><ul><li>Test end speed
  • 156. Add SKIP=x</li></li></ul><li>Sample output<br />57<br />/dev/sdc100+0 records in100+0 records out209715200 bytes (210 MB) copied, 1.60325 seconds, 131 MB/s/dev/sdd100+0 records in100+0 records out209715200 bytes (210 MB) copied, 1.60188 seconds, 131 MB/s/dev/sde100+0 records in100+0 records out209715200 bytes (210 MB) copied, 1.60067 seconds, 131 MB/s/dev/sdf100+0 records in100+0 records out209715200 bytes (210 MB) copied, 1.59928 seconds, 131 MB/s/dev/sdg100+0 records in100+0 records out<br />
  • 157. JBOD configuration<br />58<br /><ul><li>Disk adding/removal is very easy
  • 158. Add disks in bulk:alter diskgroup XXX add disk ‘/dev/sd[c-q]4’;
  • 159. Performance rocks
  • 160. controller speed
  • 161. Diagnostic is easy
  • 162. Iostat –x 5 /dev/sd*4
  • 163. Manageability is easy
  • 164. 1 diskgroup – no filenames, no mountpoints</li></li></ul><li>Final Thoughts<br />59<br /><ul><li>RAC for HA requires 3 nodes
  • 165. ASM
  • 166. Keep it simple
  • 167. Reduce layers
  • 168. Runs fast
  • 169. Still need to be carefull</li></li></ul><li>60<br />The End<br />Thank You<br />Questions?<br />I blog at<br />http://www.pythian.com/news/author/kutrovsky/<br />
  • 170. 61<br />Transition/Section Marker<br />

×