SlideShare a Scribd company logo
1 of 45
Download to read offline
Szymon Skorupinski




                     Oracle OpenWorld, San Francisco, 3rd October 2012
Outline
•   Who am I
•   CERN and Oracle
•   Backup system
•   Recovery system
•   Backups from standby databases
•   Our strategy and recommendations
•   Conclusions
Outline
•   Who am I
•   CERN and Oracle
•   Backup system
•   Recovery system
•   Backups from standby databases
•   Our strategy and recommendations
•   Conclusions
Who am I
•   Work with Oracle technologies for 7+ years
    •   DBA in CERN’s IT Department Database Group since April 2011
•   Honoured to be responsible for backup and recovery of CERN’s Oracle
    databases
•   The first international conference appearance...
    •   ...now, so please be tolerant ;)
•   Certificates
    •   Oracle Certified Professional – 10g
    •   IBM Certified System Administrator – AIX 6.1
•   Contact
    •   sskorupi@cern.ch


                                                                          5
Outline
•   Who am I
•   CERN and Oracle
•   Backup system
•   Recovery system
•   Backups from standby databases
•   Our strategy and recommendations
•   Conclusions
                                       6
CERN
•   European Organization for Nuclear Research
    •   World’s largest centre for scientific research, founded in 1954
    •   Research: Seeking and finding answers to questions about the Universe
    •   Technology, International collaboration, Education
                                 Twenty Member States
                                 Austria, Belgium, Bulgaria, Czech Republic, Denmark, Finland, France, Germany,
                                 Greece, Italy, Hungary, Netherlands, Norway, Poland, Portugal, Slovakia, Spain,
                                 Sweden, Switzerland, United Kingdom

                                   Seven Observer States
                                   European Commission, USA, Russian Federation, India, Japan, Turkey, UNESCO
                                       Associate Member States Candidate State
                                       Israel, Serbia                              Romania
                                          People
                                          ~2400 Staff, ~900 Students, post-docs and undergraduates, ~9000 Users,
                                          ~2000 Contractors

                                                                                                               7
LHC
•   The largest particle accelerator & detectors
                                            17 miles (27km) long underground
                                            tunnel
                                            Thousands of superconducting
                                            magnets
                                            Coldest place in the Universe
                                             -271.3 °C (1.9 K)
                                            but also...
                                            ...hottest place in the Universe :)
                                            100k times hotter than the heart of
                                            the Sun
                                            600 million collisions per second
                                            detected by about 150 millions of
                                            sensors


                                                                           8
WLCG
•   World’s largest computing grid
                                     More than 20 Petabytes
                                     of data stored and analysed
                                     every year

                                     Over 68 000 physical CPUs
                                     Over 305 000 logical CPUs

                                     157 computer centres in 36
                                     countries

                                     More than 8000 physicists with
                                     real-time access to LHC data

                                                                   9
CERN’s databases
•   CERN IT-DB Group manages Oracle databases used by LHC and
    its experiments, as well as financial and administrative ones
    •   Also MySQL On Demand service now in production
•   >100 Oracle databases, most of them RAC
    •   Mostly NAS storage plus some SAN with ASM
•   >70 Oracle databases backed up to tapes
    •   On average ~5.1 TB of redo daily, ~302 TB of datafiles in total
•   The biggest Oracle databases at CERN
    •   LHC logging database ~145 TB, expected growth up to 70 TB / year
    •   13 production experiments’ databases ~122 TB in total

                                                                           10
Backups source
•   10 (~95 TB) out of the biggest production experiments’
    databases are already backed up to tapes from standby
    with Active Data Guard option
    •   Still ongoing process of changing backups source for the rest

        standby

        primary
                  0
                      50
                             100
                                     150
                                              200
                                                        250

                                                                        11
Outline
•   Who am I
•   CERN and Oracle
•   Backup system
•   Recovery system
•   Backups from standby databases
•   Our strategy and recommendations
•   Conclusions
Our backup system
•   Centrally managed and clustered
    •   Scheduling on all nodes, but real execution only on active one




                                                                         13
Our backup system
•   Highly configurable
    •       Target database, e.g.
        •    Backup retention
        •    Channel definitions
        •    Archivelog deletion policy
    •       Central LDAP repository, e.g.
        •    TSM information – server and TDPO node
        •    Hostname where RMAN is executed
        •    Recovery catalog connection details
        •    Delay for recovery of copy or archivelog deletions

                                                                  14
Our backup system
•   Simply extendable
    •       Plugins
        •    NAS snapshots
        •    Read-only tablespaces backup
        •    Full or selected schemas export
    •       Templates with RMAN commands definition
        •    Dynamically modified by backup system
$ cat backup_level_arch0.tpl
backup tag 'BR_TAG' archivelog all
format '%d_%T_%U_arch' delete all input;

                                                      15
Our backup system
•   Easily manageable
    •       Suspend backups
        •    Specified or all databases
        •    All databases using specified TSM server
    •       Stop all ongoing backups or just for specific database
    •       Suspend backup pre-checks
        •    Target database up and running
        •    Recovery catalog up and running
    •       Flag -simulate to safely check effects of your command


                                                                     16
Outline
•   Who am I
•   CERN and Oracle
•   Backup system
•   Recovery system
•   Backups from standby databases
•   Our strategy and recommendations
•   Conclusions
Our recovery system
•   Available as open source on Sourceforge
    •       Developed by Ruben Gaspar Aparicio
•   Used for
    •       Validation of backup strategy
    •       Sanity tests of backups sent to tape
•   (Almost) full isolation from production database
    •       Recovery catalog not touched
        •    Restored controlfile is sufficient in almost all cases
    •       User jobs dropped after recovery
    •       Trimmed tnsnames.ora file

                                                                      18
Our recovery system
•   Easy to install on any machine
    •       Recoveries scheduled using Cron
    •       Three machines used at CERN
•   Many features
    •       Restore to filesystem or ASM
    •       Configurable cleanup after recovery
    •       Time interval for recovery
        •    Hours since current time or specific date
    •       Partial restore
        •    For PITR – e.g. single schema
        •    For VLDB – e.g. only RW part with some percentage of RO tablespaces

                                                                                   19
Our recovery system
•   Even more features
    •       Export after recovery
        •    Full or list of schemas
        •    Could be stored on tape or offsite file server
    •       Dry run with script generation only
        •    Target database connection used to gather needed data
    •       On-the-fly generation of TDPO and recovery scripts based on
        •    Central LDAP repository
        •    Database metadata
    •       Copy of successful recovery scripts kept


                                                                          20
Outline
•   Who am I
•   CERN and Oracle
•   Backup system
•   Recovery system
•   Backups from standby databases
•   Our strategy and recommendations
•   Conclusions
Backups from standby databases
•   Only taken on physical standby are interchangeable
    •       No performance impact on primary
        •    BCT overhead could also be removed
•   Possible even in 10g, but now
    •       One recovery catalog could (must) be used
        •    DB_UNIQUE_NAME different for each database
    •       Fast incremental backups possible with Active Data Guard
    •       Controlfile backups are also interchangeable
        •    Only SPFILE backups are not allowed to be used on another
             database
                                                                         22
New terms
•   Association of backups
    •   SITE_KEY set to target DB_UNIQUE_NAME
    •   Unassociated files (null SITE_KEY) are treated like they are
        associated with current target database
•   Changing association

RMAN> change ... reset db_unique_name;
RMAN> change ... reset db_unique_name to ...;
RMAN> catalog ...;


                                                                       23
New terms
•   Accessibility of backups
    •   On disk – accessible only for associated database
    •   On tape – accessible for all databases
•   BACKUP, RESTORE, CROSSCHECK work on any
    accessible backups
•   Default behaviour could be changed for specific session

RMAN> set backup files for device type disk to accessible;

    •   Undocumented!
                                                              24
Database registration
•   Only primary must be explicitly registered
RMAN> register database;

    •   Standby will be automatically registered during first connection
    •   It could also be done even before standby creation
RMAN> configure db_unique_name dgtest connect identifier
      'dgtest';
RMAN> configure db_unique_name dgtest_adg connect
      identifier 'dgtest_adg';
RMAN> list db_unique_name of database;

                                                                           25
Database unregistration
•   Unregistering all or specific standby database
RMAN> unregister database;
RMAN> unregister db_unique_name ...;

    •   Backups still usable by other databases
    •   According to docs, INCLUDING BACKUPS could be added to
        remove backups metadata, but...

RMAN-06014: command not implemented yet: unregister db &
bck


                                                                 26
RMAN configuration
•   Parameters to be set only on primary (inherited by standbys)
    •   Retention policy
    •   Database unique names

RMAN-05021: this configuration cannot be changed for a
BACKUP or STANDBY control file

•   The others could (should) be set differently on each database, e.g.
    •   Archivelog deletion policy
    •   Snapshot controlfile name
    •   Controlfile autobackup format

                                                                          27
RMAN configuration
•   The same controlfile autobackup format on primary and standby
    could lead to

RMAN-03009: failure of Control File and SPFILE Autobackup
command on ORA_SBT_TAPE_1 channel at 09/20/2012 11:47:17
ORA-19506: failed to create sequential file,
name="DGTEST_c-3733054949-20120920-ff", parms=""
ORA-27027: sbtremove2 returned error
ORA-19511: Error received from media manager layer, error
text:
ANU2614E Invalid sequence of function calls to Data
Protection for Oracle

                                                                    28
RMAN configuration
•   Moreover, sequence is set to max (-ff) – subsequent backups from
    the same day overwrite previous one (even if format is changed)
Handle: DGTEST_ADG_RAC7_c-3733054949-20120920-ff
Completion Time: 20-SEP-2012 11:50:06
Ckp SCN: 6303359618630, Media: 1877

    •   After new backup

Handle: DGTEST_ADG_RAC7_c-3733054949-20120920-ff
Completion Time: 20-SEP-2012 11:50:49
Ckp SCN: 6303359619993, Media: 1877


                                                                       29
RMAN configuration
•   Possible to set configuration for all databases while
    connected only to one

RMAN> configure ... for db_unique_name ...;


•   But look out, at least with archivelog deletion policy
    •   After setting for standby on primary, visible on standby, but not
        applied
    •   Resetting on standby makes it effective
    •   Workaround – configure all target databases one by one

                                                                            30
Resynchronization
•   Needed to ensure metadata consistency between
    recovery catalog and controlfile
    •   Partial of full triggered automatically by most of RMAN
        commands
•   Normal resynchronization
    •   From controlfile to recovery catalog
•   Reverse resynchronization
    •   From recovery catalog to primary or standby controlfile


                                                                  31
Resynchronization
•   Resynchronizing all environment from one place
RMAN> resync catalog from db_unique_name all;

    •   Even changes done without catalog will be refreshed
    •   The same SYSDBA password in password files required
    •   Target database connection must use TNS
RMAN-03002: failure of resync from db_unique_name command
at 08/30/2012 15:09:59
ORA-17629: Cannot connect to the remote database server


                                                              32
Resynchronization
•   Sensitive for any misconfigurations

RMAN-01005: ORA-20079: full resync from primary database
is not done

•   RMAN debug showed that error appeared while resynchronizing
    tempfiles information
    •       Different number of tempfiles on both databases
        •    Tempfiles not created on standby after adding on primary
             (standby_file_management set to auto) – no redo is generated
        •    Manual creation on standby solved the problem
        •    But cannot be reproduced on test

                                                                            33
Outline
•   Who am I
•   CERN and Oracle
•   Backup system
•   Recovery system
•   Backups from standby databases
•   Our strategy and recommendations
•   Conclusions
Our backup strategy
•   Datafile backups on standby
•   Archivelog backups on primary
    •       To avoid delays if communication with standby is lost
    •       To have SPFILE and controlfile backups
        •    Controlfile backups interchangeable, but additional work needed
             after restore (e.g. drop of standby logs)
•   We could easily switch backups back to primary in case
    of problems


                                                                               35
Our backup strategy
•   Things to be kept in mind
    •       Checking apply lag on standby where backups are taken
        •    Views: V$ARCHIVED_LOG, V$DATAGUARD_STATS,
             V$RECOVERY_PROGRESS
        •    STANDBY_MAX_DATA_DELAY session parameter
    •       Resolving gaps on standby using incremental backups from
            primary
        •    No backups ready to use
        •    BACKUP ... FROM SCN longer without BCT



                                                                       36
Archivelog deletion policies
•   Primary
RMAN> configure archivelog deletion policy to applied on
      all standby backed up 1 times to device type sbt;

•   Standby
RMAN> configure archivelog deletion policy to applied on
      all standby;

•   For databases with downstream capture
    •   SHIPPED instead of APPLIED
                                                           37
Automatic recoveries
•   Recovery catalog connection useful
    •       Restored controlfile not aware of all backups
•   Without recovery catalog connection
    •       Catalog backups after controlfile restore
        •    Possible even for backup pieces on tape

RMAN> catalog device type 'sbt_tape'
      backuppiece '<handle>';

    •       Too many changes required and error-prone
    •       Better to test recovery as close as possible to real exercise
                                                                            38
Other recommendations
•   Fast incremental backups
    •       BCT not used without database restart after enabling
        •    Even if enabled in mount state
        •    Check USED_CHANGED_TRACKING column in
             V$BACKUP_DATAFILE
    •       Apply fix for bug 12312133 – standby crashes during RMAN
            incremental backup with BCT enabled
        •    ORA-600 [krcccb_busy]/[krccckp_scn]
    •       Reminder
        •    Active Data Guard option needed!

                                                                       39
Other recommendations
•   Minor problem during backup from standby
    •       TOTALWORK value doubled in V$SESSION_LONGOPS
•   Good to be aware of
    •       Current log not archived on standby during backup
        •    ARCHIVE_LAG_TARGET could be used to force log switch on
             primary for low redo generation periods
•   Be careful with plugged in tablespaces
    •       Bug 13000553 – null datafile name on standby after adding
            new datafile to plugged in tablespace on primary
        •    Causes resync to crash
                                                                        40
Other recommendations
•   Backups on standby in RAC
    •   All instances used for channel allocation in the same state
RMAN-03009: failure of backup command on ORA_SBT_TAPE_1
channel at 06/27/2012 09:00:32
ORA-01138: database must either be open in this instance
or not at all continuing other job steps, job failed will
not be re-run

•   Consistent naming convention is important, e.g.
    •   TNS names used in channels definition on each database

                                                                      41
Outline
•   Who am I
•   CERN and Oracle
•   Backup system
•   Recovery system
•   Backups from standby databases
•   Our strategy and recommendations
•   Conclusions
Conclusions
•   Backups from standby work well
    •       If properly configured
•   Impact on whole environment and procedures
    •       Management a little bit more complicated
    •       DB_UNIQUE_NAME change could entail
        •    CRS reconfiguration
        •    Data and other files location change when using OMF
•   Reconfiguration needed after switchover or failover
    •       Archivelog deletion policy for example

                                                                   43
Questions?
CERN Oracle DBA Szymon Skorupinski Details Backup and Recovery Strategies

More Related Content

What's hot

Backing Up and Recovery
Backing Up and RecoveryBacking Up and Recovery
Backing Up and RecoveryMaham Huda
 
Backup and recovery in oracle
Backup and recovery in oracleBackup and recovery in oracle
Backup and recovery in oraclesadegh salehi
 
Presentation on backup and recoveryyyyyyyyyyyyy
Presentation on backup and recoveryyyyyyyyyyyyyPresentation on backup and recoveryyyyyyyyyyyyy
Presentation on backup and recoveryyyyyyyyyyyyyTehmina Gulfam
 
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...xKinAnx
 
Backup & recovery with rman
Backup & recovery with rmanBackup & recovery with rman
Backup & recovery with rmanitsabidhussain
 
What is new on 12c for Backup and Recovery? Presentation
What is new on 12c for Backup and Recovery? PresentationWhat is new on 12c for Backup and Recovery? Presentation
What is new on 12c for Backup and Recovery? PresentationFrancisco Alvarez
 
Backup and Recovery Procedure
Backup and Recovery ProcedureBackup and Recovery Procedure
Backup and Recovery ProcedureAnar Godjaev
 
Oracle backup and recovery
Oracle backup and recoveryOracle backup and recovery
Oracle backup and recoveryYogiji Creations
 
DBA 101 : Calling all New Database Administrators (WP)
DBA 101 : Calling all New Database Administrators (WP)DBA 101 : Calling all New Database Administrators (WP)
DBA 101 : Calling all New Database Administrators (WP)Gustavo Rene Antunez
 
Less14 br concepts
Less14 br conceptsLess14 br concepts
Less14 br conceptsAmit Bhalla
 
What’s new in oracle 12c recovery manager (rman)
What’s new in oracle 12c recovery manager (rman)What’s new in oracle 12c recovery manager (rman)
What’s new in oracle 12c recovery manager (rman)Satishbabu Gunukula
 
Ibm spectrum scale_backup_n_archive_v03_ash
Ibm spectrum scale_backup_n_archive_v03_ashIbm spectrum scale_backup_n_archive_v03_ash
Ibm spectrum scale_backup_n_archive_v03_ashAshutosh Mate
 
Backups And Recovery
Backups And RecoveryBackups And Recovery
Backups And Recoveryasifmalik110
 
Less17 moving data
Less17 moving dataLess17 moving data
Less17 moving dataAmit Bhalla
 
EMC Data Domain Retention Lock Software: Detailed Review
EMC Data Domain Retention Lock Software: Detailed ReviewEMC Data Domain Retention Lock Software: Detailed Review
EMC Data Domain Retention Lock Software: Detailed ReviewEMC
 
My First 100 days with an Exadata (WP)
My First 100 days with an Exadata  (WP)My First 100 days with an Exadata  (WP)
My First 100 days with an Exadata (WP)Gustavo Rene Antunez
 

What's hot (19)

Backing Up and Recovery
Backing Up and RecoveryBacking Up and Recovery
Backing Up and Recovery
 
Backup and recovery in oracle
Backup and recovery in oracleBackup and recovery in oracle
Backup and recovery in oracle
 
Presentation on backup and recoveryyyyyyyyyyyyy
Presentation on backup and recoveryyyyyyyyyyyyyPresentation on backup and recoveryyyyyyyyyyyyy
Presentation on backup and recoveryyyyyyyyyyyyy
 
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
 
Backup & recovery with rman
Backup & recovery with rmanBackup & recovery with rman
Backup & recovery with rman
 
What is new on 12c for Backup and Recovery? Presentation
What is new on 12c for Backup and Recovery? PresentationWhat is new on 12c for Backup and Recovery? Presentation
What is new on 12c for Backup and Recovery? Presentation
 
Backup and Recovery Procedure
Backup and Recovery ProcedureBackup and Recovery Procedure
Backup and Recovery Procedure
 
Oracle backup and recovery
Oracle backup and recoveryOracle backup and recovery
Oracle backup and recovery
 
DBA 101 : Calling all New Database Administrators (WP)
DBA 101 : Calling all New Database Administrators (WP)DBA 101 : Calling all New Database Administrators (WP)
DBA 101 : Calling all New Database Administrators (WP)
 
Less14 br concepts
Less14 br conceptsLess14 br concepts
Less14 br concepts
 
What’s new in oracle 12c recovery manager (rman)
What’s new in oracle 12c recovery manager (rman)What’s new in oracle 12c recovery manager (rman)
What’s new in oracle 12c recovery manager (rman)
 
Ibm spectrum scale_backup_n_archive_v03_ash
Ibm spectrum scale_backup_n_archive_v03_ashIbm spectrum scale_backup_n_archive_v03_ash
Ibm spectrum scale_backup_n_archive_v03_ash
 
Backups And Recovery
Backups And RecoveryBackups And Recovery
Backups And Recovery
 
Xpp c user_rec
Xpp c user_recXpp c user_rec
Xpp c user_rec
 
Less17 moving data
Less17 moving dataLess17 moving data
Less17 moving data
 
Less01 Dba1
Less01 Dba1Less01 Dba1
Less01 Dba1
 
EMC Data Domain Retention Lock Software: Detailed Review
EMC Data Domain Retention Lock Software: Detailed ReviewEMC Data Domain Retention Lock Software: Detailed Review
EMC Data Domain Retention Lock Software: Detailed Review
 
My First 100 days with an Exadata (WP)
My First 100 days with an Exadata  (WP)My First 100 days with an Exadata  (WP)
My First 100 days with an Exadata (WP)
 
Why virtual private catalog?
Why virtual private catalog?Why virtual private catalog?
Why virtual private catalog?
 

Similar to CERN Oracle DBA Szymon Skorupinski Details Backup and Recovery Strategies

Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardDocker, Inc.
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Databricks
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Ceph Community
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Belmiro Moreira
 
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...Spark Summit
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014Tim Bell
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Belmiro Moreira
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Lucidworks
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018Roy Russo
 
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Belmiro Moreira
 
Open stack in action cern _openstack_accelerating_science
Open stack in action  cern _openstack_accelerating_scienceOpen stack in action  cern _openstack_accelerating_science
Open stack in action cern _openstack_accelerating_scienceeNovance
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
20121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.220121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.2Tim Bell
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicTim Bell
 
OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017Stacy Véronneau
 
Real time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchReal time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchAbhishek Andhavarapu
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchTom Connor
 

Similar to CERN Oracle DBA Szymon Skorupinski Details Backup and Recovery Strategies (20)

Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
 
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018
 
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
 
Open stack in action cern _openstack_accelerating_science
Open stack in action  cern _openstack_accelerating_scienceOpen stack in action  cern _openstack_accelerating_science
Open stack in action cern _openstack_accelerating_science
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
20121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.220121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.2
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
 
OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017
 
The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...
 
Real time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchReal time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and Elasticsearch
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

CERN Oracle DBA Szymon Skorupinski Details Backup and Recovery Strategies

  • 1.
  • 2. Szymon Skorupinski Oracle OpenWorld, San Francisco, 3rd October 2012
  • 3. Outline • Who am I • CERN and Oracle • Backup system • Recovery system • Backups from standby databases • Our strategy and recommendations • Conclusions
  • 4. Outline • Who am I • CERN and Oracle • Backup system • Recovery system • Backups from standby databases • Our strategy and recommendations • Conclusions
  • 5. Who am I • Work with Oracle technologies for 7+ years • DBA in CERN’s IT Department Database Group since April 2011 • Honoured to be responsible for backup and recovery of CERN’s Oracle databases • The first international conference appearance... • ...now, so please be tolerant ;) • Certificates • Oracle Certified Professional – 10g • IBM Certified System Administrator – AIX 6.1 • Contact • sskorupi@cern.ch 5
  • 6. Outline • Who am I • CERN and Oracle • Backup system • Recovery system • Backups from standby databases • Our strategy and recommendations • Conclusions 6
  • 7. CERN • European Organization for Nuclear Research • World’s largest centre for scientific research, founded in 1954 • Research: Seeking and finding answers to questions about the Universe • Technology, International collaboration, Education Twenty Member States Austria, Belgium, Bulgaria, Czech Republic, Denmark, Finland, France, Germany, Greece, Italy, Hungary, Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland, United Kingdom Seven Observer States European Commission, USA, Russian Federation, India, Japan, Turkey, UNESCO Associate Member States Candidate State Israel, Serbia Romania People ~2400 Staff, ~900 Students, post-docs and undergraduates, ~9000 Users, ~2000 Contractors 7
  • 8. LHC • The largest particle accelerator & detectors 17 miles (27km) long underground tunnel Thousands of superconducting magnets Coldest place in the Universe -271.3 °C (1.9 K) but also... ...hottest place in the Universe :) 100k times hotter than the heart of the Sun 600 million collisions per second detected by about 150 millions of sensors 8
  • 9. WLCG • World’s largest computing grid More than 20 Petabytes of data stored and analysed every year Over 68 000 physical CPUs Over 305 000 logical CPUs 157 computer centres in 36 countries More than 8000 physicists with real-time access to LHC data 9
  • 10. CERN’s databases • CERN IT-DB Group manages Oracle databases used by LHC and its experiments, as well as financial and administrative ones • Also MySQL On Demand service now in production • >100 Oracle databases, most of them RAC • Mostly NAS storage plus some SAN with ASM • >70 Oracle databases backed up to tapes • On average ~5.1 TB of redo daily, ~302 TB of datafiles in total • The biggest Oracle databases at CERN • LHC logging database ~145 TB, expected growth up to 70 TB / year • 13 production experiments’ databases ~122 TB in total 10
  • 11. Backups source • 10 (~95 TB) out of the biggest production experiments’ databases are already backed up to tapes from standby with Active Data Guard option • Still ongoing process of changing backups source for the rest standby primary 0 50 100 150 200 250 11
  • 12. Outline • Who am I • CERN and Oracle • Backup system • Recovery system • Backups from standby databases • Our strategy and recommendations • Conclusions
  • 13. Our backup system • Centrally managed and clustered • Scheduling on all nodes, but real execution only on active one 13
  • 14. Our backup system • Highly configurable • Target database, e.g. • Backup retention • Channel definitions • Archivelog deletion policy • Central LDAP repository, e.g. • TSM information – server and TDPO node • Hostname where RMAN is executed • Recovery catalog connection details • Delay for recovery of copy or archivelog deletions 14
  • 15. Our backup system • Simply extendable • Plugins • NAS snapshots • Read-only tablespaces backup • Full or selected schemas export • Templates with RMAN commands definition • Dynamically modified by backup system $ cat backup_level_arch0.tpl backup tag 'BR_TAG' archivelog all format '%d_%T_%U_arch' delete all input; 15
  • 16. Our backup system • Easily manageable • Suspend backups • Specified or all databases • All databases using specified TSM server • Stop all ongoing backups or just for specific database • Suspend backup pre-checks • Target database up and running • Recovery catalog up and running • Flag -simulate to safely check effects of your command 16
  • 17. Outline • Who am I • CERN and Oracle • Backup system • Recovery system • Backups from standby databases • Our strategy and recommendations • Conclusions
  • 18. Our recovery system • Available as open source on Sourceforge • Developed by Ruben Gaspar Aparicio • Used for • Validation of backup strategy • Sanity tests of backups sent to tape • (Almost) full isolation from production database • Recovery catalog not touched • Restored controlfile is sufficient in almost all cases • User jobs dropped after recovery • Trimmed tnsnames.ora file 18
  • 19. Our recovery system • Easy to install on any machine • Recoveries scheduled using Cron • Three machines used at CERN • Many features • Restore to filesystem or ASM • Configurable cleanup after recovery • Time interval for recovery • Hours since current time or specific date • Partial restore • For PITR – e.g. single schema • For VLDB – e.g. only RW part with some percentage of RO tablespaces 19
  • 20. Our recovery system • Even more features • Export after recovery • Full or list of schemas • Could be stored on tape or offsite file server • Dry run with script generation only • Target database connection used to gather needed data • On-the-fly generation of TDPO and recovery scripts based on • Central LDAP repository • Database metadata • Copy of successful recovery scripts kept 20
  • 21. Outline • Who am I • CERN and Oracle • Backup system • Recovery system • Backups from standby databases • Our strategy and recommendations • Conclusions
  • 22. Backups from standby databases • Only taken on physical standby are interchangeable • No performance impact on primary • BCT overhead could also be removed • Possible even in 10g, but now • One recovery catalog could (must) be used • DB_UNIQUE_NAME different for each database • Fast incremental backups possible with Active Data Guard • Controlfile backups are also interchangeable • Only SPFILE backups are not allowed to be used on another database 22
  • 23. New terms • Association of backups • SITE_KEY set to target DB_UNIQUE_NAME • Unassociated files (null SITE_KEY) are treated like they are associated with current target database • Changing association RMAN> change ... reset db_unique_name; RMAN> change ... reset db_unique_name to ...; RMAN> catalog ...; 23
  • 24. New terms • Accessibility of backups • On disk – accessible only for associated database • On tape – accessible for all databases • BACKUP, RESTORE, CROSSCHECK work on any accessible backups • Default behaviour could be changed for specific session RMAN> set backup files for device type disk to accessible; • Undocumented! 24
  • 25. Database registration • Only primary must be explicitly registered RMAN> register database; • Standby will be automatically registered during first connection • It could also be done even before standby creation RMAN> configure db_unique_name dgtest connect identifier 'dgtest'; RMAN> configure db_unique_name dgtest_adg connect identifier 'dgtest_adg'; RMAN> list db_unique_name of database; 25
  • 26. Database unregistration • Unregistering all or specific standby database RMAN> unregister database; RMAN> unregister db_unique_name ...; • Backups still usable by other databases • According to docs, INCLUDING BACKUPS could be added to remove backups metadata, but... RMAN-06014: command not implemented yet: unregister db & bck 26
  • 27. RMAN configuration • Parameters to be set only on primary (inherited by standbys) • Retention policy • Database unique names RMAN-05021: this configuration cannot be changed for a BACKUP or STANDBY control file • The others could (should) be set differently on each database, e.g. • Archivelog deletion policy • Snapshot controlfile name • Controlfile autobackup format 27
  • 28. RMAN configuration • The same controlfile autobackup format on primary and standby could lead to RMAN-03009: failure of Control File and SPFILE Autobackup command on ORA_SBT_TAPE_1 channel at 09/20/2012 11:47:17 ORA-19506: failed to create sequential file, name="DGTEST_c-3733054949-20120920-ff", parms="" ORA-27027: sbtremove2 returned error ORA-19511: Error received from media manager layer, error text: ANU2614E Invalid sequence of function calls to Data Protection for Oracle 28
  • 29. RMAN configuration • Moreover, sequence is set to max (-ff) – subsequent backups from the same day overwrite previous one (even if format is changed) Handle: DGTEST_ADG_RAC7_c-3733054949-20120920-ff Completion Time: 20-SEP-2012 11:50:06 Ckp SCN: 6303359618630, Media: 1877 • After new backup Handle: DGTEST_ADG_RAC7_c-3733054949-20120920-ff Completion Time: 20-SEP-2012 11:50:49 Ckp SCN: 6303359619993, Media: 1877 29
  • 30. RMAN configuration • Possible to set configuration for all databases while connected only to one RMAN> configure ... for db_unique_name ...; • But look out, at least with archivelog deletion policy • After setting for standby on primary, visible on standby, but not applied • Resetting on standby makes it effective • Workaround – configure all target databases one by one 30
  • 31. Resynchronization • Needed to ensure metadata consistency between recovery catalog and controlfile • Partial of full triggered automatically by most of RMAN commands • Normal resynchronization • From controlfile to recovery catalog • Reverse resynchronization • From recovery catalog to primary or standby controlfile 31
  • 32. Resynchronization • Resynchronizing all environment from one place RMAN> resync catalog from db_unique_name all; • Even changes done without catalog will be refreshed • The same SYSDBA password in password files required • Target database connection must use TNS RMAN-03002: failure of resync from db_unique_name command at 08/30/2012 15:09:59 ORA-17629: Cannot connect to the remote database server 32
  • 33. Resynchronization • Sensitive for any misconfigurations RMAN-01005: ORA-20079: full resync from primary database is not done • RMAN debug showed that error appeared while resynchronizing tempfiles information • Different number of tempfiles on both databases • Tempfiles not created on standby after adding on primary (standby_file_management set to auto) – no redo is generated • Manual creation on standby solved the problem • But cannot be reproduced on test 33
  • 34. Outline • Who am I • CERN and Oracle • Backup system • Recovery system • Backups from standby databases • Our strategy and recommendations • Conclusions
  • 35. Our backup strategy • Datafile backups on standby • Archivelog backups on primary • To avoid delays if communication with standby is lost • To have SPFILE and controlfile backups • Controlfile backups interchangeable, but additional work needed after restore (e.g. drop of standby logs) • We could easily switch backups back to primary in case of problems 35
  • 36. Our backup strategy • Things to be kept in mind • Checking apply lag on standby where backups are taken • Views: V$ARCHIVED_LOG, V$DATAGUARD_STATS, V$RECOVERY_PROGRESS • STANDBY_MAX_DATA_DELAY session parameter • Resolving gaps on standby using incremental backups from primary • No backups ready to use • BACKUP ... FROM SCN longer without BCT 36
  • 37. Archivelog deletion policies • Primary RMAN> configure archivelog deletion policy to applied on all standby backed up 1 times to device type sbt; • Standby RMAN> configure archivelog deletion policy to applied on all standby; • For databases with downstream capture • SHIPPED instead of APPLIED 37
  • 38. Automatic recoveries • Recovery catalog connection useful • Restored controlfile not aware of all backups • Without recovery catalog connection • Catalog backups after controlfile restore • Possible even for backup pieces on tape RMAN> catalog device type 'sbt_tape' backuppiece '<handle>'; • Too many changes required and error-prone • Better to test recovery as close as possible to real exercise 38
  • 39. Other recommendations • Fast incremental backups • BCT not used without database restart after enabling • Even if enabled in mount state • Check USED_CHANGED_TRACKING column in V$BACKUP_DATAFILE • Apply fix for bug 12312133 – standby crashes during RMAN incremental backup with BCT enabled • ORA-600 [krcccb_busy]/[krccckp_scn] • Reminder • Active Data Guard option needed! 39
  • 40. Other recommendations • Minor problem during backup from standby • TOTALWORK value doubled in V$SESSION_LONGOPS • Good to be aware of • Current log not archived on standby during backup • ARCHIVE_LAG_TARGET could be used to force log switch on primary for low redo generation periods • Be careful with plugged in tablespaces • Bug 13000553 – null datafile name on standby after adding new datafile to plugged in tablespace on primary • Causes resync to crash 40
  • 41. Other recommendations • Backups on standby in RAC • All instances used for channel allocation in the same state RMAN-03009: failure of backup command on ORA_SBT_TAPE_1 channel at 06/27/2012 09:00:32 ORA-01138: database must either be open in this instance or not at all continuing other job steps, job failed will not be re-run • Consistent naming convention is important, e.g. • TNS names used in channels definition on each database 41
  • 42. Outline • Who am I • CERN and Oracle • Backup system • Recovery system • Backups from standby databases • Our strategy and recommendations • Conclusions
  • 43. Conclusions • Backups from standby work well • If properly configured • Impact on whole environment and procedures • Management a little bit more complicated • DB_UNIQUE_NAME change could entail • CRS reconfiguration • Data and other files location change when using OMF • Reconfiguration needed after switchover or failover • Archivelog deletion policy for example 43