Your SlideShare is downloading. ×
0
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Adventures in Dataguard
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Adventures in Dataguard

5,035

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
5,035
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
293
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Transcript

    • 1. Adventures in Dataguard Dr. Jason Arneil
    • 2. Why Dataguard Motivation
    • 3. <ul><li>Introduction </li></ul><ul><li>The Motivation </li></ul><ul><li>Dataguard Architecture &amp; Features </li></ul><ul><li>Creating a Physical Standby </li></ul><ul><li>Maintaining your standby </li></ul><ul><li>Using your Standby </li></ul><ul><li>Performing a Switchover </li></ul>AGENDA
    • 4. Health Warning Introduction
    • 5. About Me Introduction <ul><li>Jason Arneil </li></ul><ul><li>System Administrator/DBA </li></ul><ul><li>Using Oracle since 1998 </li></ul><ul><li>At Nominet since 2001 </li></ul>
    • 6. About Nominet Introduction <ul><li>Nominet is the internet registry for .uk domain names </li></ul><ul><li>Nominet has been in existence for over 11 years </li></ul><ul><li>Nominet is run as a not-for-profit company </li></ul><ul><li>Nominet is owned by its members </li></ul><ul><li>There are over 6 Million .uk domain names </li></ul>
    • 7. Why Dataguard Motivation <ul><li>Big push on a Nominet Business Continuity Plan </li></ul><ul><li>Dataguard is the Oracle solution for disaster recovery </li></ul><ul><li>Physical Standby was the obvious option </li></ul><ul><li>Maximum Availability Architecture (MAA) </li></ul>
    • 8. Business Continuity Site Motivation
    • 9. Dataguard Processes Architecture &amp; Features Primary Database Transactions Physical/Logical Standby Database Backup / Reports Transform Redo to SQL for SQL Apply MRP/ LSP ARCH Archived Redo Logs Archived Redo Logs ARCH Oracle Net Standby Redo Logs RFS FAL Online Redo Logs LGWR LNS
    • 10. Dataguard Features Architecture &amp; Features <ul><li>Several Protection Modes </li></ul><ul><ul><li>Maximum Protection </li></ul></ul><ul><ul><li>Maximum Availability </li></ul></ul><ul><ul><li>Maximum Performance </li></ul></ul><ul><li>Several Transport Modes </li></ul><ul><ul><li>LGWR SYNC </li></ul></ul><ul><ul><li>LGWR ASYNC </li></ul></ul><ul><ul><li>ARCH </li></ul></ul>
    • 11. Prepare Primary &amp; Standby Creating a Standby <ul><li>Prepare Primary Database </li></ul><ul><ul><li>Enable Force Logging </li></ul></ul><ul><ul><li>SQL&gt; alter database force logging; </li></ul></ul><ul><ul><li>Modify initialization parameters </li></ul></ul><ul><li>Prepare Standby Database </li></ul><ul><ul><li>Setup directory structure </li></ul></ul><ul><ul><li>Create spfile with correct parameters </li></ul></ul><ul><ul><li>Start database in nomount </li></ul></ul>
    • 12. Log Transport Parameters Creating a Standby <ul><li>LOG_ARCHIVE_CONFIG=&apos;DG_CONFIG=(PRIMARY, STANDBY)&apos; </li></ul><ul><li>LOG_ARCHIVE_DEST_1=&apos;LOCATION=/var/oracle/PRIMARY/arch&apos; </li></ul><ul><li>LOG_ARCHIVE_DEST_2=&apos;SERVICE=PRIMARC DB_UNIQUE_NAME=PRIMARY&apos; </li></ul><ul><li>LOG_ARCHIVE_DEST_3=&apos;SERVICE=STANDBY LGWR ASYNC </li></ul><ul><li>REOPEN=15 MAX_FAILURE=10 OPTIONAL VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=STANDBY&apos; </li></ul>
    • 13. ssh tunnels Creating a Standby <ul><li>You may not wish your redo data being sent unencrypted across the internet to your standby. You can use ssh tunnels to avoid this </li></ul><ul><ul><li>ssh -N -L 3333:standby:1521 oracle@standby </li></ul></ul><ul><li>Now the tnsnames entry points to the localhost </li></ul><ul><li>STANDBYARC = </li></ul><ul><li> (DESCRIPTION = </li></ul><ul><li> (SDU = 32767) </li></ul><ul><li> (ADDRESS_LIST = </li></ul><ul><li> (ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT=3333))) </li></ul><ul><li> (CONNECT_DATA = </li></ul><ul><li>(SERVICE_NAME = STANDBY))) </li></ul>
    • 14. Some Other Parameters Creating a Standby <ul><li>FAL_SERVER </li></ul><ul><li>FAL_CLIENT </li></ul><ul><li>ARCHIVE_LAG_TARGET </li></ul><ul><li>STANDBY_FILE_MANAGEMENT </li></ul><ul><li>DB_FILE_NAME_CONVERT </li></ul><ul><li>LOG_FILE_NAME_CONVERT </li></ul>
    • 15. backup your primary Creating a Standby <ul><li>Backup primary - rman is good </li></ul><ul><ul><li>rman&gt; backup format &apos;/backup/%U&apos; database plus archivelog; </li></ul></ul><ul><ul><li>rman&gt; backup format &apos;/backup/%U&apos; current controlfile for standby; </li></ul></ul><ul><li>Recover backup on standby node </li></ul><ul><ul><li>I like using rman duplicate to create standby: </li></ul></ul><ul><li>(oracle$) rman target sys/password@PRIMARY auxiliary / </li></ul><ul><li>rman&gt; duplicate target database for standby; </li></ul>
    • 16. Start applying redo Creating a Standby <ul><li>Create standby redo log files on both primary and standby: </li></ul><ul><ul><li>sql&gt; alter database add standby logfile thread 2 group 42 (’PATH_TO_DATA/standbyredo01.log&apos;) size 512M; </li></ul></ul><ul><li>Now you can start the physical standby recovering logs: </li></ul><ul><ul><li>sql&gt;alter database recover managed standby database disconnect from session; </li></ul></ul><ul><li>Or if you prefer real time apply: </li></ul><ul><ul><li>sql&gt;alter database recover managed standby database using current logfile disconnect from session; </li></ul></ul>
    • 17. Monitoring the Standby Maintaining your standby <ul><li>You have to ensure your standby is keeping up with your primary </li></ul><ul><li>You can check which was the last log to have been applied to your standby is </li></ul><ul><ul><li>sql&gt; SELECT MAX(SEQUENCE#), THREAD# </li></ul></ul><ul><ul><ul><li> FROM V$ARCHIVED_LOG </li></ul></ul></ul><ul><ul><ul><li> where APPLIED=&apos;YES&apos; </li></ul></ul></ul><ul><ul><ul><li> GROUP BY THREAD#; </li></ul></ul></ul><ul><li>MAX(SEQUENCE#) THREAD# </li></ul><ul><li>-------------- ---------- </li></ul><ul><li> 2976 1 </li></ul><ul><li> 1888 2 </li></ul>
    • 18. Monitoring Standby Progress Maintaining your standby <ul><li>A good way of checking what the background processes of your standby are up to is using v$managed_standby </li></ul><ul><ul><li>SQL&gt; select process, sequence#, status </li></ul></ul><ul><li> from V$managed_standby; </li></ul><ul><li>PROCESS SEQUENCE# STATUS </li></ul><ul><li> -------- ---------- ------------ </li></ul><ul><li> ARCH 2967 CLOSING </li></ul><ul><li> ARCH 2974 CLOSING </li></ul><ul><li> RFS 2977 IDLE </li></ul><ul><li> MRP0 1889 APPLYING_LOG </li></ul><ul><li> RFS 1889 IDLE </li></ul><ul><li> RFS 2977 IDLE </li></ul>
    • 19. Monitoring Your Standby Maintaining your standby <ul><li>You have to ensure your standby is keeping up with your primary </li></ul><ul><li>V$DATAGUARD_STATS provides useful information </li></ul><ul><ul><li>SQL&gt; select name, value from v$dataguard_stats; </li></ul></ul><ul><li>NAME VALUE </li></ul><ul><li>-------------------------------- ------------------------------------ </li></ul><ul><li>apply finish time +00 00:00:00 </li></ul><ul><li>apply lag +00 00:00:11 </li></ul><ul><li>estimated startup time 41 </li></ul><ul><li>standby has been open N </li></ul><ul><li>transport lag +00 00:00:03 </li></ul>
    • 20. Monitoring Your Standby Maintaining your standby <ul><li>A way of finding out what has been happening to your standby over a period time is to look at the v$dataguard_status view </li></ul><ul><ul><li>Log Apply Services 01-AUG-07 Media Recovery Waiting for thread 1 sequence 2977 (in transit) </li></ul></ul><ul><ul><li>Log Apply Services 01-AUG-07 Media Recovery Waiting for thread 1 sequence 2977 (in transit) </li></ul></ul><ul><ul><li>Log Apply Services 01-AUG-07 Media Recovery Waiting for thread 2 sequence 1889 (in transit) </li></ul></ul><ul><ul><li>Remote File Server 01-AUG-07 Primary database is in MAXIMUM PERFORMANCE mode </li></ul></ul><ul><ul><li>Remote File Server 01-AUG-07 RFS[53]: Successfully opened standby log 14: &apos;+DATA2/standby/standbyredo02.log&apos; </li></ul></ul>
    • 21. Oracle can’t divide by 0 Maintaining your standby <ul><li>Standby was happily working away </li></ul><ul><ul><li>ORA-07445: exception encountered: core dump [kcrarmb()+152] [SIGFPE] [Integer divide by zero] [0x00085C300 </li></ul></ul><ul><li>MRP process crashes </li></ul><ul><ul><li>No redo gets applied from this point </li></ul></ul><ul><li>Logs after the one that caused the ORA-07445 still being shipped </li></ul><ul><li>A simple restart of the managed recovery process does a FAL and the standby is back up-to-date </li></ul>
    • 22. kcrfr_resize2 Maintaining your standby <ul><li>Lots of problems after upgrade to 10.2.0.3 </li></ul><ul><ul><li>Recovery of Online Redo Log: Thread 2 Group 23 Seq 999 Reading mem 0 </li></ul></ul><ul><ul><li>Mem# 0: +DATA3/standby/standbyredo11.log </li></ul></ul><ul><ul><li>ORA-00600: internal error code, arguments: [kcrfr_resize2], [652614828032], [268423168], [], [], [], [], [] </li></ul></ul><ul><li>Perhaps caused by the following: </li></ul><ul><ul><li>Bug 3306010 OERI[kcrfr_resize2] possible in MEDIA recovery </li></ul></ul><ul><li> Media recovery may fail with ORA-600 [kcrfr_resize2] when </li></ul><ul><li> the number of redo strands is set to a high value using </li></ul><ul><li> log_parallelism. </li></ul>
    • 23. kcrfr_resize2 Maintaining your standby <ul><li>This issue has recently been published as Note:453259.1 </li></ul><ul><ul><li>Triggered by having a large log_buffer </li></ul></ul><ul><li>This bug affects 10.2.0.3 and potentially 9.2.0.8 </li></ul><ul><li>It is related to the size of the log_buffer parameter </li></ul><ul><li>Fix is included in 10.2.0.4 </li></ul>
    • 24. kcrrupirfs Maintaining your standby <ul><li>ARC processes died on primary: </li></ul><ul><ul><li>ORA-00600: [kcrrupirfs.20] [4] [368] </li></ul></ul><ul><li>Trace file showed the following: </li></ul><ul><li>Corrupt redo block 479421 detected: bad block number </li></ul><ul><li>Flag: 0x0 Format: 0x0 Block: 0x00000000 Seq: 0x00000000 Beg: 0x0 Cks:0x0 &lt;&lt;&lt;&lt;&lt;&lt;&lt;-- </li></ul><ul><li>----- Dump of Corrupt Redo Buffer -----000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 </li></ul>
    • 25. kcrrupirfs Maintaining your standby <ul><li>Oracle think initially think this ORA-600 error was hardware related </li></ul><ul><ul><li>There are NO indications of any hardware fault - the primary keeps running </li></ul></ul><ul><li>After a couple of weeks it was decided this was a “bug situation” </li></ul><ul><ul><li>This was bug 4767278 which talked about FAL not being able to read from multiple mirror sides when encountering invalid/stale redo in a file. Apparently required for ASM configurations because ASM does not guarantee all mirror sides contain same data after writing. </li></ul></ul><ul><ul><li>We were using ASM, but external redundancy </li></ul></ul><ul><ul><li>Oracle then said “The ASM group is not 100% sure if the patch 4767278 will fix the problem” </li></ul></ul>
    • 26. log corruption Maintaining your standby <ul><li>The Managed Recovery process crashed complaining about log corruption </li></ul><ul><li>MRP0: Background Media Recovery terminated with error 355 </li></ul><ul><li>ORA-00355: change numbers out of order </li></ul><ul><li>ORA-00353: log corruption near block 2 change 1273622545 time 03/06/2007 08:32:46 </li></ul><ul><li>ORA-00312: online log 13 thread 1: &apos;+DATA2/standby/standbyredo01.log&apos; </li></ul><ul><li>Oracle blame the upgrade process at first. They suggest rebuilding the standby </li></ul><ul><li>Then I notice that trying managed recovery rather than real time apply seems to allow the standby to progress </li></ul>
    • 27. log corruption Maintaining your standby <ul><li>At this point Oracle say “it looks like a bug” </li></ul><ul><li>Lots of time spent diagnosing the issue </li></ul><ul><ul><li>ALTER SYSTEM DUMP LOGFILE &apos;+DATA2/nom/standby33.log&apos; scn min 865465290 scn max 865465300; </li></ul></ul><ul><li>Eventually Oracle produced a patch 5746174 </li></ul><ul><ul><li>MRP HANGS WITH ASYNC LNS AND PARALLEL ARCHIVAL </li></ul></ul>
    • 28. Utilize those cpu cycles Using Your Standby <ul><li>A Standby can be considered an insurance policy </li></ul><ul><li>Several ways to utilize your standby </li></ul><ul><ul><li>Run your backups from your standby </li></ul></ul><ul><ul><li>Open your standby read only for reporting </li></ul></ul><ul><ul><li>Flashback standby to look at old data </li></ul></ul><ul><ul><li>Open your standby read write for testing purposes </li></ul></ul>
    • 29. Open for Reports Using Your Standby <ul><li>You need to cancel managed recovery </li></ul><ul><ul><li>sql&gt; alter database recover managed standby database cancel; </li></ul></ul><ul><li>Then simply open the standby </li></ul><ul><ul><li>sql&gt; alter database open; </li></ul></ul><ul><li>Redo is still transported to your standby </li></ul><ul><li>To transition back to applying redo shutdown the open standby, startup mount and restart the recovery process </li></ul>
    • 30. Open for read write Using Your Standby <ul><li>You must have flashback database enabled for this </li></ul><ul><li>Stop redo apply on standby </li></ul><ul><li>Create a restore point </li></ul><ul><li>Activate the Standby &amp; perform read/write testing </li></ul><ul><li>Flashback to restore point </li></ul><ul><li>Start the redo on the Standby again </li></ul>
    • 31. Open for read write Using Your Standby Physical Standby Physical Standby read write Restore Point Flashback Database Activate standby
    • 32. Flashback Database in a Nutshell Using Your Standby <ul><li>Set up Flashback Database </li></ul><ul><ul><li>alter system set db_recovery_file_dest_size = 8G; </li></ul></ul><ul><ul><li>alter system set db_recovery_file_dest = &apos;your flashback destination&apos;; </li></ul></ul><ul><ul><li>alter system set db_flashback_retention_target = 1440 ; </li></ul></ul><ul><ul><li>alter database flashback on; </li></ul></ul><ul><li>Once you have cancelled the standby recovery create a guaranteed restore point </li></ul><ul><ul><li>create guaranteed restore point before_activate; </li></ul></ul>
    • 33. Open for read write Using Your Standby <ul><li>Activate your Standby </li></ul><ul><ul><li>SQL&gt; ALTER DATABASE ACTIVATE STANDBY DATABASE; </li></ul></ul><ul><li>You can open the Standby for business </li></ul><ul><ul><li>SQL&gt; ALTER DATABASE OPEN; </li></ul></ul><ul><li>To become a Standby again shutdown and startup in mount </li></ul><ul><ul><li>SQL&gt; FLASHBACK DATABASE TO RESTORE POINT BEFORE_ACTIVATE; </li></ul></ul><ul><ul><li>SQL&gt; ALTER DATABASE CONVERT TO PHYSICAL STANDBY; </li></ul></ul>
    • 34. Open for read write Using Your Standby <ul><li>However things never go according to plan </li></ul><ul><ul><li>ORA-00600: internal error code, arguments: [3705], [1], [8], [3], [8], [], [] </li></ul></ul><ul><li>This was bug 4479323 which is a bug with recovery (not standby specific) and only occurs in a RAC environment </li></ul><ul><li>This is fixed in 10.2.0.3 </li></ul>
    • 35. It’s good to test Doing a Switchover <ul><li>A business continuity plan is no good unless it’s been tested </li></ul><ul><li>It’s not all about the database </li></ul><ul><li>Good to think in terms of services </li></ul>
    • 36. Database Switchover Doing a Switchover <ul><li>Make sure your standby is up-to-date </li></ul><ul><li>Check your primary database switchover status: </li></ul><ul><ul><li>primary&gt; SELECT SWITCHOVER_STATUS FROM V$DATABASE; </li></ul></ul><ul><li>Switchover primary database </li></ul><ul><ul><li>primary&gt; ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY with session shutdown; </li></ul></ul><ul><li>Switchover the standby </li></ul><ul><ul><li>standby&gt; ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY with session shutdown; </li></ul></ul>
    • 37. DNS Primer Doing a switchover <ul><li>DNS allows translation from hostname to IP address </li></ul><ul><ul><li>example.co.uk IN A 162.0.0.1 </li></ul></ul><ul><li>Our principle is all services are accessed through a CNAME </li></ul><ul><ul><li>anexample.co.uk 5M IN CNAME example.co.uk </li></ul></ul><ul><li>relocation of the service is just a case of changing where the CNAME points </li></ul>
    • 38. Conclusion Conclusion <ul><li>Dataguard is an efficient DR solution for your primary database </li></ul><ul><li>Dataguard is mostly reliable but is not without it’s blips </li></ul><ul><li>There are opportunities for gaining added value from your standby </li></ul><ul><li>You can’t test your Business continuity plan enough </li></ul>
    • 39. Questions? Adventures in Dataguard <ul><li>Contact: </li></ul><ul><li>[email_address] </li></ul><ul><li>http://blog.nominet.org.uk </li></ul>

    ×