Dulo: an effective buffer cache management scheme to exploit both temporal and spatial localities
1. 2.5
As we know, disk I/O performance is critical to data-intensive applications.
This is because these applications demand efficient I/O support. For
example, database systems can manage millions of records on a storage
device and access them in either small or large pieces, which requires low
access time or high transfer rate of storage devices. For multimedia
applications, they often access large blocks of data in a predictable
sequence and demand a guaranteed minimal transfer rate. For scientific
applications, I/O can be a big challenge because a huge amount of data can
be requested in very-large-scale parallel systems within a short time. At Los
Alamos national lab, ASCI mission-oriented programs conducts large-scale
simulation-based analysis, which requires several gigabytes per second I/O
bandwidth to support physical simulation and visualization. Their access
2. 0.5
The performance of the hard disk is limited by mechanical constraints. To
read or write a disk block, disk head has to on the right disk track through
seeking and on the right sector through disk platter rotation. From the graph
I just showed, you can see how slow disk seek--movement of disk arm--is.
That is, disk arm is the Achilles' heel of disk access performance. If you
access disk sequentially, you will minimize disk seeks and make full use of
disk rotations. So access of sequential blocks is faster than access of
randomly placed blocks by at least an order of magnitude.
3. 0.5
This is the outline for the rest of the talk. First I will present my proposed
scheme that uses disk layout information in buffer cache to improve disk
performance. I will show the inadequacy of current buffer cache
management in an OS. After describing how to efficiently managing disk
layout information, I’ll present my proposed History-based prefetching and
Miss-penalty aware caching, followed by a Performance evaluation of the
scheme in a Linux kernel implementation.
Next I will briefly introduce my proposed schemes on the coordination of
distributed caches to reduce I/O requests, including Coordination of multi-level
caches in a hierarchy and cooperative management of caches in peer-clients.
4. 2.5
As we know, disk I/O performance is critical to data-intensive applications.
This is because these applications demand efficient I/O support. For
example, database systems can manage millions of records on a storage
device and access them in either small or large pieces, which requires low
access time or high transfer rate of storage devices. For multimedia
applications, they often access large blocks of data in a predictable
sequence and demand a guaranteed minimal transfer rate. For scientific
applications, I/O can be a big challenge because a huge amount of data can
be requested in very-large-scale parallel systems within a short time. At Los
Alamos national lab, ASCI mission-oriented programs conducts large-scale
simulation-based analysis, which requires several gigabytes per second I/O
bandwidth to support physical simulation and visualization. Their access
5. 2.5
As we know, disk I/O performance is critical to data-intensive applications.
This is because these applications demand efficient I/O support. For
example, database systems can manage millions of records on a storage
device and access them in either small or large pieces, which requires low
access time or high transfer rate of storage devices. For multimedia
applications, they often access large blocks of data in a predictable
sequence and demand a guaranteed minimal transfer rate. For scientific
applications, I/O can be a big challenge because a huge amount of data can
be requested in very-large-scale parallel systems within a short time. At Los
Alamos national lab, ASCI mission-oriented programs conducts large-scale
simulation-based analysis, which requires several gigabytes per second I/O
bandwidth to support physical simulation and visualization. Their access
6. 1.75
To utilize disk layout information for buffer cache management, we need to answer two
questions before we design a new cache management scheme. The first question is which
disk layout information to use. The second question is how to efficiently manage the disk
layout information. The layout information that is interesting to us is the one that can help
locate sequentially accessed blocks. We use Logical block number (LBN) that describes
logical disk geometry provided by disk firmware. This is because disk manufacturers have
made every effort to ensure accessing of continuous LBNs has a performance close to
that of contiguous disk blocks. Another advantage of using LBN is that this interface is
easily available and highly portable across different platforms.
For the second question we know currently LBN is only used to identify disk locations for
7. Then the sequence, X1..X4, are requested. The first block is on-demand
fetched, and the following
blocks are prefetched quickly without disk seeking. So blocks X2, X3, and
X4 are hits.
We assume that blocks are replaced from the bottom. LRU always put
recently accessed blocks
at the top. However, because random blocks are more expensive in their
8. Then the Y sequence is accessed through prefetching. You can see in LRU
the random blocks are replaced,
while in the dual locality policy these blocks are retained in the blocks.
9. Then the Y sequence is accessed through prefetching. You can see in LRU
the random blocks are replaced,
while in the dual locality policy these blocks are retained in the blocks.
10. Now the X sequence is requested. All the blocks in the request are hits for
LRU.
Unfortunately, the dual locality policy just replaced them. But the good thing
is
that these sequential blocks are cheap to re-load by another prefetching.
11. This time we want to re-access the random blocks. They are hit in dual
locality policy.
However, the LRU policy has to take four time-consuming disk rotation and
seeking times
to reload them. By considering the different access costs between
sequential and random
Blocks, dual locality policy makes a big performance difference: reduce disk
12. Because LRU or its variants are the most widely used replacement
algorithms, we build the DULO scheme by using the LRU algorithm and its
data structure -- the LRU stack, as a reference point.
There are two key DULO operations: one is sequence forming.
Sequence is defined a number of blocks whose disk
13. 2
The disk block table is similar in structure to a multi-level page table in operating systems. Just as each process
has a page table, each disk has a block table.
While a process table is used for translating a page’s virtual address into its physical address, we use the block
table to record and track the recent access times of a disk block through its LBN. In this illustrative example,
the block table has 3 levels, each entry in a directory level corresponds to 512 entries in its next lower level.
Then LBN 5140 is mapped to this entry at leaf level through directory entries 0 and 10. At the leaf level of the
table, a block can record up to two recent access times. Because we cannot afford record exact access
times for each block, we let the system maintain a clock. The clock ticks when a block on disk is accessed.
Assume the current clock time is 7. This block has only one timestamp, which is 1. When this block is
accessed, it takes the current clock time as its most recent timestamp and records it in its corresponding
table entry. If the entry is full, the oldest timestamp is replaced. We also record the most recent timestamp at
the directory level of the table. So timestamp 7 is also recorded in the entries at these two directory levels.
Using the block table, we can build efficient algorithm for finding access sequences by comparing timestamps of
neighboring leaf block entries.
You might be concerned with the space cost for the table when more and more blocks are added in it. Actually we
only need to keep the disk working set in the table, and the table supports efficient space reclamation. We
know a entry at the a directory level records the largest timestamp among all those of the blocks in the
directory. When memory pressure is high and the system needs to reclaim some memory held by the table,
we can traverse the table with a threshold timestamp. When we see a directory entry whose timestamp is
smaller than the threshold, all the entries under it are removed. In this way, the space overhead can be
14. 2
The disk block table is similar in structure to a multi-level page table in operating systems. Just as each process
has a page table, each disk has a block table.
While a process table is used for translating a page’s virtual address into its physical address, we use the block
table to record and track the recent access times of a disk block through its LBN. In this illustrative example,
the block table has 3 levels, each entry in a directory level corresponds to 512 entries in its next lower level.
Then LBN 5140 is mapped to this entry at leaf level through directory entries 0 and 10. At the leaf level of the
table, a block can record up to two recent access times. Because we cannot afford record exact access
times for each block, we let the system maintain a clock. The clock ticks when a block on disk is accessed.
Assume the current clock time is 7. This block has only one timestamp, which is 1. When this block is
accessed, it takes the current clock time as its most recent timestamp and records it in its corresponding
table entry. If the entry is full, the oldest timestamp is replaced. We also record the most recent timestamp at
the directory level of the table. So timestamp 7 is also recorded in the entries at these two directory levels.
Using the block table, we can build efficient algorithm for finding access sequences by comparing timestamps of
neighboring leaf block entries.
You might be concerned with the space cost for the table when more and more blocks are added in it. Actually we
only need to keep the disk working set in the table, and the table supports efficient space reclamation. We
know a entry at the a directory level records the largest timestamp among all those of the blocks in the
directory. When memory pressure is high and the system needs to reclaim some memory held by the table,
we can traverse the table with a threshold timestamp. When we see a directory entry whose timestamp is
smaller than the threshold, all the entries under it are removed. In this way, the space overhead can be
15. 2
The disk block table is similar in structure to a multi-level page table in operating systems. Just as each process
has a page table, each disk has a block table.
While a process table is used for translating a page’s virtual address into its physical address, we use the block
table to record and track the recent access times of a disk block through its LBN. In this illustrative example,
the block table has 3 levels, each entry in a directory level corresponds to 512 entries in its next lower level.
Then LBN 5140 is mapped to this entry at leaf level through directory entries 0 and 10. At the leaf level of the
table, a block can record up to two recent access times. Because we cannot afford record exact access
times for each block, we let the system maintain a clock. The clock ticks when a block on disk is accessed.
Assume the current clock time is 7. This block has only one timestamp, which is 1. When this block is
accessed, it takes the current clock time as its most recent timestamp and records it in its corresponding
table entry. If the entry is full, the oldest timestamp is replaced. We also record the most recent timestamp at
the directory level of the table. So timestamp 7 is also recorded in the entries at these two directory levels.
Using the block table, we can build efficient algorithm for finding access sequences by comparing timestamps of
neighboring leaf block entries.
You might be concerned with the space cost for the table when more and more blocks are added in it. Actually we
only need to keep the disk working set in the table, and the table supports efficient space reclamation. We
know a entry at the a directory level records the largest timestamp among all those of the blocks in the
directory. When memory pressure is high and the system needs to reclaim some memory held by the table,
we can traverse the table with a threshold timestamp. When we see a directory entry whose timestamp is
smaller than the threshold, all the entries under it are removed. In this way, the space overhead can be
16. 2
The disk block table is similar in structure to a multi-level page table in operating systems. Just as each process
has a page table, each disk has a block table.
While a process table is used for translating a page’s virtual address into its physical address, we use the block
table to record and track the recent access times of a disk block through its LBN. In this illustrative example,
the block table has 3 levels, each entry in a directory level corresponds to 512 entries in its next lower level.
Then LBN 5140 is mapped to this entry at leaf level through directory entries 0 and 10. At the leaf level of the
table, a block can record up to two recent access times. Because we cannot afford record exact access
times for each block, we let the system maintain a clock. The clock ticks when a block on disk is accessed.
Assume the current clock time is 7. This block has only one timestamp, which is 1. When this block is
accessed, it takes the current clock time as its most recent timestamp and records it in its corresponding
table entry. If the entry is full, the oldest timestamp is replaced. We also record the most recent timestamp at
the directory level of the table. So timestamp 7 is also recorded in the entries at these two directory levels.
Using the block table, we can build efficient algorithm for finding access sequences by comparing timestamps of
neighboring leaf block entries.
You might be concerned with the space cost for the table when more and more blocks are added in it. Actually we
only need to keep the disk working set in the table, and the table supports efficient space reclamation. We
know a entry at the a directory level records the largest timestamp among all those of the blocks in the
directory. When memory pressure is high and the system needs to reclaim some memory held by the table,
we can traverse the table with a threshold timestamp. When we see a directory entry whose timestamp is
smaller than the threshold, all the entries under it are removed. In this way, the space overhead can be
17. Because LRU or its variants are the most widely used replacement
algorithms, we build the DULO scheme by using the LRU algorithm and its
data structure -- the LRU stack, as a reference point.
There are two key DULO operations: one is sequence forming.
Sequence is defined a number of blocks whose disk
18. Because LRU or its variants are the most widely used replacement
algorithms, we build the DULO scheme by using the LRU algorithm and its
data structure -- the LRU stack, as a reference point.
There are two key DULO operations: one is sequence forming.
Sequence is defined a number of blocks whose disk
19. 0.5
This is the outline for the rest of the talk. First I will present my proposed
scheme that uses disk layout information in buffer cache to improve disk
performance. I will show the inadequacy of current buffer cache
management in an OS. After describing how to efficiently managing disk
layout information, I’ll present my proposed History-based prefetching and
Miss-penalty aware caching, followed by a Performance evaluation of the
scheme in a Linux kernel implementation.
Next I will briefly introduce my proposed schemes on the coordination of
distributed caches to reduce I/O requests, including Coordination of multi-level
caches in a hierarchy and cooperative management of caches in peer-clients.