Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
11g r2 flashcache_Tips
1. Oracle Database Smart Flash Cache
Executive Summary
The Oracle Database Smart Flash Cache is a quasi-extension of the Oracle database’s block
buffer cache onto flash memory devices. The Flash Cache is a read-only cache of clean database
blocks on tier-1 storage (such as flash memory devices on the PCI Express bus). It is intended to
improve the performance of Oracle databases by reducing the number of I/O requests that must
be serviced by tier-2 storage (e.g., conventional hard disks or older generation solid state
devices). As blocks age out of the database’s buffer cache they can be moved to the Flash
Cache, and quickly recalled into the database’s buffer cache if needed by future database
operations. Without the Flash Cache, accessing a block that has aged out of memory requires
relatively slow I/O to fetch the block from conventional storage back into memory.
Database Smart Flash Cache is loosely referred to as Flash Cache. It is literally a cache on flash.
There several key differences between the database’s buffer cache and the Flash Cache, so it
would not be correct to say the Flash Cache extends the database buffer cache. First, the Flash
Cache can only store clean blocks. Second, the Flash Cache is read-only: to modify a block that
is in the Flash Cache the block must first be read into the buffer cache. After the block has been
modified the dirty image is flushed to storage and marked clean in the buffer cache. Eventually,
the clean block will age back to the Flash Cache. Thus we see at no point does the Flash Cache
contain dirty block images, and users cannot modify blocks within the Flash Cache. The third
primary difference is the buffer cache must be persisted in RAM while the Flash Cache cannot.
To understand the reason Oracle created the Flash Cache, consider how the database buffer
cache works. All blocks eventually age out of the database buffer cache. Subsequent requests
for those blocks require physical I/O. Physical I/O can be an expensive operation. A
recommended solution is to move the database files from conventional storage devices onto
locally installed PCIe based flash memory storage devices, such as Fusion-io devices, which
have higher bandwidth and lower latency. Not all database configurations support local storage.
A second solution for any such customer is to significantly increase the size of Oracle SGA, by
as much as 10x, but the cost of DRAM makes this cost prohibitive. Also, the amount of DRAM
that can be addressed by a user process (e.g., Oracle) is limited by many factors including the
BIOS, operating system kernel, and server hardware designs. Thus, a third solution is required.
Enter the Oracle Database Smart Flash Cache. The Flash Cache supports locally installed flash
memory devices, and provides a cost effective means of increasing Oracle’s buffer space.
The Flash Cache is a free feature of Oracle Database Server 11gR2 Enterprise Edition. It is not
available with any other edition of Oracle. It can only be used on an Oracle operating system.
For example, it can be used on OEL but not RHEL.
Editor’s Notes
Flash Cache might be the best performance enhancing feature in Oracle 11gR2, and its benefits
are proportional to the Flash Cache’s underlying storage. Using a single Fusion-io card to hold
the Flash Cache can boost overall database performance many times over. Not all customers are
able to use the Flash Cache due to licensing and technical restrictions.
1
2. What Is Flash Cache
The official product name is “Database Smart Flash Cache”. It is referred to in this paper as
Flash Cache. There is an unrelated Exadata Flash Cache product, not discussed in this paper.
The Database Smart Flash Cache is a secondary buffer cache that sits behind the Oracle SGA’s
database block buffer cache. Without Flash Cache, clean blocks that age out of memory require
expensive I/O for subsequent access. With Flash Cache, clean blocks age out of the buffer cache
to the Flash Cache from where they can be instantly retrieved back into the buffer cache upon
request. An illustration is provided later.
Once the Flash Cache is configured all tables and indexes can use it. By default all tables in
Oracle 11.2.0.2 and later are configured to use the Flash Cache, and do so automatically once the
Flash Cache is enabled. However, there are some restrictions and limitations listed later in this
document to be aware of.
Flash Cache is automated and internalized. It cannot be used directly. It is part of Oracle’s
memory management sub-system, and therefore it can only be used by the database engine. For
example: you cannot tell Oracle to load the SGA onto the Flash Cache, nor can you store tables
or indexes on a flash cache device. What you can do is tell Oracle how big to make the Flash
Cache, and which tables to cache in the Flash Cache. Oracle will automatically determine when
to write blocks into the Flash Cache and when to erase blocks from the Flash Cache.
What Flash Cache Is Not
The Flash Cache cannot be used to store database files. The Flash Cache is not a general
purpose cache. It cannot be used to buffer reads from specific storage devices or file systems. It
cannot buffer writes at all. It cannot be used to cache query results – Oracle has another feature
called the Server Results Cache that can be used to cache query results.
Why Use Flash Cache
Many customers have a sizable imbalance between the amount of data in their database and the
amount of memory on their database server. They can only cache a very small percentage of
their data in memory. When the Oracle database buffer cache is too small, blocks get evicted
from memory and must be re-fetched from disk into memory over and over again. Such a
problem could be eliminated by adding significantly more memory (DRAM) to the host, but that
solution is cost prohibitive. Servers may impose physical limits on DRAM as well.
The Oracle Database Smart Flash Cache, or simply “Flash Cache”, allows customers to expand
Oracle’s data caching capabilities onto high-speed high-capacity storage devices, such Fusion-io.
Flash Cache allows customers to cache a seemingly unlimited amount of data on devices that
perform many times faster than conventional storage and at a price point well below DRAM.
The first indication that Flash Cache might be needed often comes from a memory advisor
within Oracle Enterprise Manager (OEM). If OEM recommends increasing the size of the block
buffer cache by at least 2x, then Flash Cache should be investigated.
2
3. How Flash Cache Works
The below picture is from the Oracle 11g R2 Concepts Guide:
The elements of the above picture can be described as follows:
The object shown as “magnetic disk” in the above illustration is the database’s permanent
storage for database files. Permanent storage is often the slowest server component:
many Oracle customers buy storage for a certain capacity-for-price point rather than for a
performance-for-price point (i.e., they buy hard drives or SAN appliances).
The database buffer cache sits in main memory, which is typically very fast and very
expensive DRAM, and its purpose is to buffer database bocks retrieved from permanent
storage. Oracle tries to perform all queries, inserts, updates, and deletes in the buffer
cache, but if the data is not in the cache it is fetched from storage by a “Server Process”.
The Flash Cache sits in locally attached storage (not shared storage). The Flash Cache is
actually a single file, and should be stored on flash memory devices for maximum
performance.
After a Server Process has fetched a block into the buffer cache, the block can stay in the buffer
cache “forever”. As long as the block is touched by a Server Process often enough it remains on
the “hot” list and is kept in the buffer cache. If it is not touched for some threshold amount of
time, then it is considered a “cold” block and becomes eligible for migration to the Flash Cache
or permanent storage.
Flash Cache does not buffer all data. Flash Cache only supports the DEFAULT pool of the
database buffer cache. It does not buffer any blocks from the nK buffer caches, the KEEP pool,
or the RECYCLE pool. Also by default the Flash Cache will not buffer any blocks that were
read into memory as a result of a scan operation. This is configurable as described elsewhere in
this document.
Loss of the Flash Cache cannot lead to data loss. The Flash Cache only holds blocks that are
fully persistent on disk. When blocks in the database buffer cache are dirtied, they are written
out to disk as usual. Oracle always performs the write-out before the blocks are moved into the
Flash Cache. Thus, all blocks in the Flash Cache are “clean”.
3
4. Here is a description of the workflow that happens when a user requests a block of data:
The user’s Server Process first scans the database’s in-memory buffer cache.
• IF the requested block is not found in the database buffer cache, then the Server Process
scans the Flash Cache.
If the block is found in the Flash Cache, then the Server Process moves the block into the
database buffer cache using a type of physical I/O called “optimized physical read”.
If the block is not found in the Flash Cache, then the Server Process sends a request to the
host file system.
• The host will read the block from the file system into the file system buffer cache
(physical I/O) and then signal the Server Process to take over.
• The Server Process will read the block from the file system buffer cache into the database
buffer cache (logical I/O).
• Finally, the Server Process reads the block from within the database buffer cache for
processing.
Notice above we see three I/O’s occur for each block read, although
Oracle’s metrics will only reflect one physical and one logical I/O
since it does not count reads by file system processes.
• ELSIF the block is found in the database buffer cache, or if a Server Process has fetched
the block into the buffer cache from storage or Flash Cache, then the Server Process can
operate on the block.
The Server Process performs logical I/O on the block within the in-memory buffer cache.
If the operation is read-only, then the block is not dirtied and does not need to be written
to permanent storage. Over time the block can be aged out of the buffer cache. It will go
to the Flash Cache if enabled, or to permanent storage.
If the operation dirties the block, then the Database Block Writer process (DBWn) is
responsible for writing it to permanent storage and then marking the block clean in the
buffer cache.
• If using direct I/O, then DBWn writes the dirty buffers to storage. Otherwise, DBWn
only writes to the file system buffer cache which will eventually write to storage.
• Once the block is marked clean it is subject to aging rules. It will go to the Flash Cache
if enabled, or to permanent storage.
The above workflow is general. The user may specify a unique operation that will behave
differently.
How Does The Flash Cache Work With Full Table Scans
If the user performs a full table scan, then Oracle will count the number of blocks already in the
buffer cache for that table.
4
5. If the number is high, then Oracle will read the blocks from disk into the buffer cache. In
this case blocks are eligible for the Flash Cache, but the blocks can only age out to the
Flash Cache if the table’s storage property is set to FLASH_CACHE KEEP.
If the number is low, then Oracle will perform a direct-path read and will not cache the
blocks. In this case the blocks are not eligible for the Flash Cache regardless of the
table’s storage property.
Some types of scans will never read data into memory. The SELECT COUNT(*) statement will
not read blocks of data into memory, and you will not see any use of the Flash Cache.
See this document’s section on “Configuration” for more information about configuring tables to
work with the Flash Cache. The discussion includes full table scans.
5
6. Requirements
When the product was initially released it required an Oracle Exadata V2 machine and was only
supported on Oracle Enterprise Linux (OEL). Support for Exadata V2 with Solaris SPARC was
added later. Eventually, Oracle announced support for using Flash Cache on OEL, and for Flah
Cache on Solaris without Exadata hardware.
Database Smart Flash Cache has the following requirements (at the time of this writing,
September 2011):
It requires Oracle Database Server 11.2.0.2 or higher. If you really must run it on
11.2.0.1, you can obtain Linux patch 8974084 and PSU 11.2.0.1.1 from Oracle.
It requires Enterprise Edition. No other editions are supported; Flash Cache is not
supported on Standard or Standard One Edition.
It requires one of the following operating systems: Oracle Solaris SPARC 64-bit, Oracle
Solaris X86_64, OEL 32-bit, or OEL x86_64. See the Oracle Database Licensing
Information documentation on-line. Other operating systems like Windows, AIX, RHEL,
SuSE are not supported.
To use Flash Cache on Solaris you must have Solaris 10U6 or higher with the following
patches: 125555-03, 140796-01, 140899-01, 141016-01, 139555-08, 141414-10, 141736-
05.
For every block stored in the Flash Cache, Oracle consumes 100 bytes of storage in the
buffer cache for metadata (pointers, etc.) If you are using RAC, the number is 200 bytes
per block. A 640 GB flash cache with 8K block size can hold up to 8388608 blocks, so
you lose 800 MB of buffer cache in non-RAC systems, or 1.6 GB of buffer cache in RAC
systems. The solution is to increase parameter db_cache_size by the same amount being
taken away by Flash Cache.
The storage device must be at least 101 MB larger than DB_FLASH_CACHE_SIZE. If
the Flash Cache encroaches on this reserved space, then the database will not start.
If you are using RAC, then the Database Smart Flash Cache must be configured on either
all nodes or none of them. You cannot use it on “some” of the nodes. Parameter
DB_FLASH_CACHE_FILE must be set identically on all nodes. (Note: I have heard
from at least one customer who is using Flash Cache on just one node of a 3 node RAC).
The initial release of Flash Cache required Oracle’s own Sun Flash PCI cards loaded with
high speed SLC NAND flash memory. The current release of Flash Cache allows you to
use Fusion-io PCIe devices with SLC and MLC NAND flash memory.
6
7. Configuration
General Information About Configuring Flash Cache
By default an Oracle database has no Flash Cache, but all tables and indexes are automatically
configured to use it anyways. Once the DBA has created the Flash Cache the tables and indexes
automatically start using it. If this is not desirable, the DBA can alter each table or index’s Flash
Cache properties.
Start by installing one or more flash memory cards in the database server.
Next, format the cards, or import them into an ASM diskgroup for better performance. ASM is
not required, but recommended. My recommendation is to leave the flash cards unformatted,
create a partition offset by 1 MB, and feed the partitions to ASM; all flash devices should be
managed as disks within redundant ASM diskgroup.
RAC Tip: each instance requires its own Flash Cache device, and when using ASM each
instance requires its own ASM diskgroup.
The next step is to set the database initialization parameters DB_FLASH_CACHE_FILE and
DB_FLASH_CACHE_SIZE.
Bounce the instance so Oracle can initialize the Flash Cache.
The next step is optional: you can re-configure each table or index’s Flash Cache properties. By
default all tables and indexes are set to STORAGE(FLASH_CACHE DEFAULT). This means
all blocks fetched into the buffer cache by a “db file sequential read” operation can use the Flash
Cache. If you would also like to include blocks fetched by scan operations simply alter the
property to KEEP, or to prevent it from using the Flash Cache set the property to NONE.
The STORAGE clause looks like this:
STORAGE
({ INITIAL size_clause
| NEXT size_clause
| MINEXTENTS integer
| MAXEXTENTS { integer | UNLIMITED }
| maxsize_clause
| PCTINCREASE integer
| FREELISTS integer
| FREELIST GROUPS integer
| OPTIMAL [ size_clause | NULL ]
| BUFFER_POOL { KEEP | RECYCLE | DEFAULT }
| FLASH_CACHE { KEEP | NONE| DEFAULT }
| ENCRYPT
)
Notice the FLASH_CACHE clause has three settings, which are described below:
DEFAULT is the default setting. It tells Oracle you want blocks to be written to the flash
cache when they are aged out of the database buffer cache, and they can be aged out of
the flash cache according to Oracle’s LRU algorithm. Since DEFAULT is the default,
you can omit it entirely for the same effect as shown in the below example.
7
8. KEEP tells Oracle to cache the object’s blocks in Flash as long as space permits.
NONE tells Oracle you do not want blocks for this table to use the flash cache.
Example 1: you do not want table EMP to use Flash Cache.
ALTER TABLE EMP STORAGE (FLASH_CACHE NONE);
Example 2: you want table EMP to use the Flash Cache regardless of how the blocks were
fetched into memory.
ALTER TABLE EMP STORAGE (FLASH_CACHE KEEP);
Example 3: you want to return table EMP to the default use of Flash Cache.
ALTER TABLE FOO STORAGE (FLASH_CACHE);
Understanding The Flash Cache Initialization Parameters
There are only two initialization parameters related to Flash Cache. Each is detailed below.
DB_FLASH_CACHE_FILE
This parameter is used to specify the ASM disk group or the fully qualified name of a file that
represents your Database Smart Flash Cache. This parameter should only be set by customers
who are using the Database Smart Flash Cache feature. The Flash Cache should be stored on the
fastest possible flash memory storage device, like a Fusion-io ioDrive. The storage device should
be dedicated to the Flash Cache.
The parameter can be set using an ALTER SYSTEM statement. However, you must first install
hardware (flash storage), and set parameter DB_FLASH_CACHE_SIZE, then set this parameter
to the SPFILE and then bounce the database.
The parameter can be used with ASM diskgroups, file systems, and unformatted block devices.
To use a raw device you must create a symbolic link that points to the raw device, and give the
name of the link to Oracle. Here are a few examples:
ALTER SYSTEM SET DB_FLASH_CACHE_FILE='/dev/fioa1' SCOPE=SPFILE SID='*';
ALTER SYSTEM SET db_flash_cache_file='/dev/sdd1' SCOPE=spfile SID='*';
ALTER SYSTEM SET DB_FLASH_CACHE_FILE='+FLASH/MYDBA/FLASHFILE/fc.ora' SCOPE=SPFILE;
Please observe the following notes:
The SID clause is optional.
If the file does not exist Oracle will create it.
When using raw devices there are no files, so specify the device partition (such as sdd1).
The device mst be partitioned, and the flash cache must be on the partition to avoid
Oracle clobbering the disk’s volume label.
The oracle user must be granted r/w permissions (i.e., chmod 660 /dev/sdd1).
When using ASM to store the Flash Cache you must specify a file name and not just a
diskgroup name. See above example.
8
9. The parameter can be set to a symbolic link that points to the real flash cash file.
If you are using RAC please note the Flash Cache file cannot be shared by multiple instances.
Every instance must point to a separate file. However, you must set this parameter to the same
value on all nodes. This means if you are using ASM then you must use a separate diskgroup for
instance.
DB_FLASH_CACHE_SIZE
This parameter allows you to specify the size of the Flash Cache, which is defined by another
parameter DB_FLASH_CACHE_FILE. The default is 0, which disables the Flash Cache
feature. The minimum suggested size is 2 * db_cache_size. The maximum suggested size is 10
* db_cache_size. These are not strictly enforced. However, the larger the Flash Cache the more
buffers are consumed in the database buffer cache, so that a sufficiently large Flash Cache may
prevent Oracle from starting unless you also increase the parameter db_cache_size.
The parameter can be set using an ALTER SYSTEM command like this:
ALTER SYSTEM SET DB_FLASH_CACHE_SIZE=2400G SCOPE=SPFILE SID='*';
This parameter may only be specified at instance startup. After instance startup you cannot
change the size of the Flash Cache but you can disable/enable the Flash Cache. That is, you
cannot change the size from 500G to 501G at run time, but you can set the parameter to 0 using
an ALTER SYSTEM command which effectively disables the Flash Cache while the database is
running. You can re-enable flash cache by setting this parameter to the same value you when
using the database was started, but you cannot set it to a different value.
The size of the Flash Cache must be at least 100 MB smaller than the flash storage device. For
example, if the storage device is 320 GB then the maximum db_flash_cache_size is roughly
31900M.
RAC Tips
According to the Oracle documentation and Oracle Support web site, the Flash Cache file cannot
be shared by multiple instances and every instance must point to a separate file, and you must set
this parameter to the same value on all nodes. These three requirements mean if you are using
ASM then you must use a separate diskgroup for each RAC instance, and also means you must
use local storage like Fusion-io, not shared storage.
I have talked to customers who have implemented the Flash Cache on “some” instances. For
example, one customer had a 8-node RAC and only implemented Flash Cache on 4 nodes.
9
10. Monitoring Flash Cache
Activity shows up in the AWR as Optimized Physical Reads.
Metrics can be obtained easily from the view V$SYSSTAT like this:
select * from v$sysstat where name like 'flash cache%';
To see which segments and blocks are in the Flash Cache, use the view V$BH like this:
SELECT owner || '.' || object_name object,
SUM (CASE WHEN b.status LIKE 'flash%' THEN 1 END) flash_blocks,
SUM (CASE WHEN b.status LIKE 'flash%' THEN 0 else 1 END) cache_blocks,
count(*) total_blocks
FROM v$bh b
JOIN dba_objects ON (objd = object_id)
GROUP BY owner, object_name
order by 4 desc;
The above SQL statement was copied from Guy Harrison’s web site.
10
11. Troubleshooting, Issues, Bugs & Patches
The Flash Cache is populated by the Database Block Writer (DBWn) processes. This is a low
priority task for DBWn, compared to writing dirty blocks to permanent storage. Thus, when
DBWn is saturated the Flash Cache will not appear to be used. The solution is to increase the
database initialization parameter DB_WRITER_PROCESSES. The default value for this
parameter is 4, which is good for most customers, but some customers will need a higher value.
The initial release of 11gR2 (11.2.0.1) did not support Flash Cache. Oracle released a patch to
make Flash Cache work on 11.2.0.1, but it had many bugs. I recommend against using Flash
Cache with Oracle version 11.2.0.1.
The second release of 11gR2 (11.2.0.2) also had many bugs related to Flash Cache. Some of the
bugs were limited to RAC, but others bugs affected all customers.
The third release of 11gR2 (11.2.0.3) is considered to be stable.
Below is list of major bugs in Flash Cache:
Bugs 8444791 and 10216012: NOT ABLE TO SPECIFY THE DISKGROUP NAME IN
DB_FLASH_CACHE_FILE. When using ASM you cannot set db_flash_cache_file to
the name of a diskgroup. At last check this bug was not fixed. Do not worry. In this
document I describe the correct was to set parameter db_flash_cache_file using a full file
specification.
Bugs 12730844 and 12673694: ORA-600 [KJBRASR:PKEY], [62839680], "Lock
conflicting with the NEW request is on the remastering queue". Affects versions 11.2.0.1
and higher, fixed in 12.1.0.1. The published workaround is “do not use flash cache”.
Bug 9199151: DATABASE FLASH CACHE FILE RE-USE SEMANTICS ARE
FAULTY. In both RAC and non-RAC environments, if you point two instances to the
same flash cache file, then the 1st instance will own the file only until the 2nd instance
starts, at which time the 2nd instance will take ownership of the file so that the 1st
instance now has no flash cache. Any blocks belonging to the 1st instance will be
trapped in the flash cache until the next shutdown/restart of which ever instance currently
owns the flash cache file (the 2nd instance in this case). Oracle says this bug is not
feasible to fix, so everyone should be made aware of how to avoid it. The workaround is
to manually ensure every init.ora file uses a distinct value for db_flash_cache_file, such
as using the instance name as part of the file name. If you are using RAC, then you
might set parameter sid.db_flash_cache_file rather than *.db_flash_cache_file. NOTE:
this workaround has not been tested.
Some customers have reported Oracle will not start if db_flash_cache_size is set to a value that
is more than ten times the value of db_cache_size. The Oracle documentation states that the
Flash Cache “should” be 2x - 10x of db_cache_size, but Oracle does not enforce a minimum or
maximum size. The Oracle documentation also states the maximum value of parameter
db_flash_cache_size is operating system dependent. There are two things that can limit it:
available memory and the file system limits. In other words, anytime you create a file the
maximum size is limited by the file system on which you create the file. If you are using Linux
file systems ext2 or ext3, then the max file size is based on the block size: if the block size is 512
11
12. bytes or 1 KB then the max file size is 16 GB; 2K = 256 GB; and 4K+ = 2TB. If you are
creating the file on ext4 then the maximum size of the file is 4 GB * disk block size (not database
block size). In summary, flash cache on Linux is limited to the file sizes shown in this chart …
ext2 / ext3 ext4
Block Size
Max File Size Max File Size
512 bytes 1 GB 2 TB
1K 16 GB 4 TB
2K 256 GB 8 TB
4K or higher 2 TB 16 TB
12