Users who are creating redo information typically from Insert update delete Statements have to write the information into the redo log buffer. If the buffer fills up before the LGWR can write out the information then users have to wait.
Redo log 1 fills up. LGWR switches to log Redo log 2 which requires Get next log file from control file Get Redo Copy and Redo Allocation latch Flush redo Close File Update Controlfile Set new file to Current Set old file to Active If in Archivelog mode add file to archive list Open all members of new logfile group Write the SCN to the headers Enable redo log generation At the same time DBWR makes a list of all blocks in the buffer cache that are dirty And have redo in log 1. This list of blocks has to be written out To disk before LGWR can reuse log 1
In this case none of the DBWR checkpoints 1, 2 or 3 finish before LGWR filled up all 3 redo logs. IN this case users must wait until DBWR finishes the checkpoints of writing all the dirty blocks out. In older versions of Oracle the checkpoints were merged together, so all 3 checkpoints had to finish before Redo log 1 could be reusued. In later versions ( I think starting in 9) the checkpoints were kept separate thus once checkpoint 1 had finished, then log 1 could be reused.
IN this case the archiver for some reason, hasn’t been able to archive log 1 and now LGWR needs to reause it. IN this case all transactional activity in the database comes to a halt. To any user with tranactions, the database has effectively hung. This is almost always caused by the archive destination filling. Make room on the destination disk. You can manually stop and start the archiver to make sure it restarts after room is made archive log stop; -- make room in log_archive_dest archive log start;
select ESTIMATED_MTTR from v$instance_recovery; From Chris Foot 10G Automatic Checkpoint Tuning If you do not set FAST_START_MTTR_TARGET, or set it to a very large value, Oracle10g will provide automatic checkpoint tuning. The database will write out dirty blocks from the cache as fast as possible without negatively impacting database performance. The DBA is no longer required to set any of the aforementioned checkpoint parameters. SELECT TARGET_MTTR, ESTIMATED_MTTR, CKPT_BLOCK_WRITES FROM V$INSTANCE_RECOVERY CKPT_BLOCK_WRITES = represents overhead from fast_start_mttr_target
From Metalink: The message means that we haven't completed writing all the redo information to the log when we are trying to switch. It is similar in nature to a "checkpoint not complete" except that is only involves the redo being written to the log. The log switch can not occur until all of the redo has been written. A "strand" is new terminology for 10g and it deals with latches for redo . Strands are a mechanism to allow multiple allocation latches for processes to write redo more efficiently in the redo buffer and is related to the log_parallelism parameter present in 9i. The concept of a strand is to ensure that the redo generation rate for an instance is optimal and that when there is some kind of redo contention then the number of strands is dynamically adjusted to compensate. The initial allocation for the number of strands depends on the number of CPU's and is started with 2 strands with one strand for active redo generation. For large scale enterprise systems the amount of redo generation is large and hence these strands are *made active* as and when the foregrounds encounter this redo contention (allocated latch related contention) when this concept of dynamic strands comes into play. There is always shared strands and a number of private strands . Oracle 10g has some major changes in the mechanisms for redo (and undo), which seem to be aimed at reducing contention. Instead of redo being recorded in real time, it can be recorded 'privately' and pumped into the redo log buffer on commit. Similary the undo can be generated as 'in memory undo' and applied in bulk. This affect the memory used for redo management and the possibility to flush it in pieces. The message you get is related to internal Cache Redo File management. You can disregard these messages as normal messages. When you switch logs all private strands have to be flushed to the current log before the switch is allowed to proceed.
Log File Operations Redo is written to disk when User commits Log Buffer 1/3 full (_log_io_size) Log Buffer fills 1M Every 3 seconds DBWR asks LGWR to flush redo Sessions Commiting wait for LGWR Copyright 2006 Kyle Hailey #.4
log buffer space Wait for space in the redo log buffer in SGA Solution 1. Increase log_buffer parameter in init.ora Above 3M log_buffer little affect, if still a problem then backup is at disk level 1. Improve disk IO for redo Faster disk Raw file Direct IO Dedicated disk p1, p2, p3 – no values Copyright 2006 Kyle Hailey #.5
Log Buffer Space SGA Log Library Buffer Buffer Cache Cache Buffer Cache Log Buffer LGWR User1 1. Log Buffer too small User2 2. LGWR too slow User3 Slow disk Insert UpdateREDO Log Files delete #.6 Copyright 2006 Kyle Hailey
log file sync Wait for redo flush upon: Commit Rollback Arguments P1 = buffer# in log buffer that needs to be flushed P2 = not used P3 = not used select parameter1, parameter2, parameter3 from v$event_name where name=‘log file sync; PARAMETER1 PARAMETER2 PARAMETER3 buffer# Copyright 2006 Kyle Hailey #.7
Log File Sync: Solutions Commit less Often possible in loops that commit every loop Commit every 50 or 100 instead Possibly 10gR2 ALTER SYSTEM SET COMMIT_WRITE = BATCH, NOWAIT Commit could be lost if machine crash or IO error Improve IO Use Raw Device or Direct IO Consider Ram Disks Can stripe if redo writes are comparable to stripe size Striping shouldn’t hurt Striping can help Ex: imp – can have large redo writes – can improve by 10- 30% Alternate disks for redo and archiving of redo(_high_priority_processes) Copyright 2006 Kyle Hailey #.9
Log File Sync depends on: log file parallel write Time it takes for LGWR to write out changes If log file sync =~ log file parallel write And the time is slow ( > 3ms) look into IO issues If log file sync >> log file parallel write Look at CPU starvation issues Copyright 2006 Kyle Hailey #.10
switch logfile command Same as log file switch completion but the command is executed by the dba Alter system switch logfile; Copyright 2006 Kyle Hailey #.18
Concerns – Recovery Time What happens to recovery time if I change my redo log file sizes Larger Redo Log size can increase recovery time but There are init.ora parameters to limit this Copyright 2006 Kyle Hailey #.19
Incremental Checkpoints (9iR2+) FAST_START_MTTR_TARGET Seconds to Recovery Easy and accuracy Is overridden by FAST_START_IO_TARGET Is overridden by LOG_CHECKPOINT_INTERVAL alter system set fast_start_mttr_target=17 scope=both; SQL> select ESTIMATED_MTTR from V$INSTANCE_RECOVERY; SQL> select ESTIMATED_MTTR from V$INSTANCE_RECOVERY; ESTIMATED_MTTR ESTIMATED_MTTR -------------- -------------- 21 21 Copyright 2006 Kyle Hailey #.20
Recovery and Checkpoints SGA Log Library Buffer DBWR Buffer Cache Cache LGWR Data Files Current Position Needed for Recovery1 2 3 Incremental Checkpoint REDO Log Files Copyright 2006 Kyle Hailey #.21
DBWR dirty List and LGWR Buffers DBWR usually just writes out LGWR Current dirty blocks at Position end of LRU until checkpoint Incremental Checkpoint DBWR Checkpoint aNow, DBWR Block xxxxkeeps a Block xxxxcheckpoint list that Block xxxx Block xxxxit writes out Copyright 2006 Kyle Hailey #.22
DBWR dirty List MRU - Hot Buffer Headers LRU - Cold DBWR also has to Dirty List track dirty blocks at the Block xxxx cold end of the LRU Block xxxx Block xxxx DBWR Block xxxx Copyright 2006 Kyle Hailey #.23
DBWR merges Dirty and Checkpoint MRU - Hot Buffer Headers LRU - Cold Checkpoint a Dirty List Block xxxx Block xxxx Block xxxx Block xxxx DBWR Block xxxx Block xxxx Block xxxx Block xxxx Write List Block xxxx Block xxxx Block xxxx Block xxxx Data Files Copyright 2006 Kyle Hailey #.24
log file switch (private strand flush incomplete) New wait 10g Like a “log file switch Completion” Copyright 2006 Kyle Hailey #.25
Redo Wait Solutionslog file sync Commit less, put redo logs on faster diskslog buffer space Increase log buffer no more than 32M, then tune LGWRlog file switch completion Increase log file sizeslog file switch (checkpoint incomplete) Add log files (or increase log file size)switch logfile command Avoid switching log files log file switch (private strand flush incomplete) increase log file sizeslog file switch (archiving needed) *** Archive log running out of space Copyright 2006 Kyle Hailey #.26