SlideShare a Scribd company logo
1 of 20
2.5 
As we know, disk I/O performance is critical to data-intensive applications. 
This is because these applications demand efficient I/O support. For 
example, database systems can manage millions of records on a storage 
device and access them in either small or large pieces, which requires low 
access time or high transfer rate of storage devices. For multimedia 
applications, they often access large blocks of data in a predictable 
sequence and demand a guaranteed minimal transfer rate. For scientific 
applications, I/O can be a big challenge because a huge amount of data can 
be requested in very-large-scale parallel systems within a short time. At Los 
Alamos national lab, ASCI mission-oriented programs conducts large-scale 
simulation-based analysis, which requires several gigabytes per second I/O 
bandwidth to support physical simulation and visualization. Their access
0.5 
The performance of the hard disk is limited by mechanical constraints. To 
read or write a disk block, disk head has to on the right disk track through 
seeking and on the right sector through disk platter rotation. From the graph 
I just showed, you can see how slow disk seek--movement of disk arm--is. 
That is, disk arm is the Achilles' heel of disk access performance. If you 
access disk sequentially, you will minimize disk seeks and make full use of 
disk rotations. So access of sequential blocks is faster than access of 
randomly placed blocks by at least an order of magnitude.
0.5 
This is the outline for the rest of the talk. First I will present my proposed 
scheme that uses disk layout information in buffer cache to improve disk 
performance. I will show the inadequacy of current buffer cache 
management in an OS. After describing how to efficiently managing disk 
layout information, I’ll present my proposed History-based prefetching and 
Miss-penalty aware caching, followed by a Performance evaluation of the 
scheme in a Linux kernel implementation. 
Next I will briefly introduce my proposed schemes on the coordination of 
distributed caches to reduce I/O requests, including Coordination of multi-level 
caches in a hierarchy and cooperative management of caches in peer-clients.
2.5 
As we know, disk I/O performance is critical to data-intensive applications. 
This is because these applications demand efficient I/O support. For 
example, database systems can manage millions of records on a storage 
device and access them in either small or large pieces, which requires low 
access time or high transfer rate of storage devices. For multimedia 
applications, they often access large blocks of data in a predictable 
sequence and demand a guaranteed minimal transfer rate. For scientific 
applications, I/O can be a big challenge because a huge amount of data can 
be requested in very-large-scale parallel systems within a short time. At Los 
Alamos national lab, ASCI mission-oriented programs conducts large-scale 
simulation-based analysis, which requires several gigabytes per second I/O 
bandwidth to support physical simulation and visualization. Their access
2.5 
As we know, disk I/O performance is critical to data-intensive applications. 
This is because these applications demand efficient I/O support. For 
example, database systems can manage millions of records on a storage 
device and access them in either small or large pieces, which requires low 
access time or high transfer rate of storage devices. For multimedia 
applications, they often access large blocks of data in a predictable 
sequence and demand a guaranteed minimal transfer rate. For scientific 
applications, I/O can be a big challenge because a huge amount of data can 
be requested in very-large-scale parallel systems within a short time. At Los 
Alamos national lab, ASCI mission-oriented programs conducts large-scale 
simulation-based analysis, which requires several gigabytes per second I/O 
bandwidth to support physical simulation and visualization. Their access
1.75 
To utilize disk layout information for buffer cache management, we need to answer two 
questions before we design a new cache management scheme. The first question is which 
disk layout information to use. The second question is how to efficiently manage the disk 
layout information. The layout information that is interesting to us is the one that can help 
locate sequentially accessed blocks. We use Logical block number (LBN) that describes 
logical disk geometry provided by disk firmware. This is because disk manufacturers have 
made every effort to ensure accessing of continuous LBNs has a performance close to 
that of contiguous disk blocks. Another advantage of using LBN is that this interface is 
easily available and highly portable across different platforms. 
For the second question we know currently LBN is only used to identify disk locations for
Then the sequence, X1..X4, are requested. The first block is on-demand 
fetched, and the following 
blocks are prefetched quickly without disk seeking. So blocks X2, X3, and 
X4 are hits. 
We assume that blocks are replaced from the bottom. LRU always put 
recently accessed blocks 
at the top. However, because random blocks are more expensive in their
Then the Y sequence is accessed through prefetching. You can see in LRU 
the random blocks are replaced, 
while in the dual locality policy these blocks are retained in the blocks.
Then the Y sequence is accessed through prefetching. You can see in LRU 
the random blocks are replaced, 
while in the dual locality policy these blocks are retained in the blocks.
Now the X sequence is requested. All the blocks in the request are hits for 
LRU. 
Unfortunately, the dual locality policy just replaced them. But the good thing 
is 
that these sequential blocks are cheap to re-load by another prefetching.
This time we want to re-access the random blocks. They are hit in dual 
locality policy. 
However, the LRU policy has to take four time-consuming disk rotation and 
seeking times 
to reload them. By considering the different access costs between 
sequential and random 
Blocks, dual locality policy makes a big performance difference: reduce disk
Because LRU or its variants are the most widely used replacement 
algorithms, we build the DULO scheme by using the LRU algorithm and its 
data structure -- the LRU stack, as a reference point. 
There are two key DULO operations: one is sequence forming. 
Sequence is defined a number of blocks whose disk
2 
The disk block table is similar in structure to a multi-level page table in operating systems. Just as each process 
has a page table, each disk has a block table. 
While a process table is used for translating a page’s virtual address into its physical address, we use the block 
table to record and track the recent access times of a disk block through its LBN. In this illustrative example, 
the block table has 3 levels, each entry in a directory level corresponds to 512 entries in its next lower level. 
Then LBN 5140 is mapped to this entry at leaf level through directory entries 0 and 10. At the leaf level of the 
table, a block can record up to two recent access times. Because we cannot afford record exact access 
times for each block, we let the system maintain a clock. The clock ticks when a block on disk is accessed. 
Assume the current clock time is 7. This block has only one timestamp, which is 1. When this block is 
accessed, it takes the current clock time as its most recent timestamp and records it in its corresponding 
table entry. If the entry is full, the oldest timestamp is replaced. We also record the most recent timestamp at 
the directory level of the table. So timestamp 7 is also recorded in the entries at these two directory levels. 
Using the block table, we can build efficient algorithm for finding access sequences by comparing timestamps of 
neighboring leaf block entries. 
You might be concerned with the space cost for the table when more and more blocks are added in it. Actually we 
only need to keep the disk working set in the table, and the table supports efficient space reclamation. We 
know a entry at the a directory level records the largest timestamp among all those of the blocks in the 
directory. When memory pressure is high and the system needs to reclaim some memory held by the table, 
we can traverse the table with a threshold timestamp. When we see a directory entry whose timestamp is 
smaller than the threshold, all the entries under it are removed. In this way, the space overhead can be
2 
The disk block table is similar in structure to a multi-level page table in operating systems. Just as each process 
has a page table, each disk has a block table. 
While a process table is used for translating a page’s virtual address into its physical address, we use the block 
table to record and track the recent access times of a disk block through its LBN. In this illustrative example, 
the block table has 3 levels, each entry in a directory level corresponds to 512 entries in its next lower level. 
Then LBN 5140 is mapped to this entry at leaf level through directory entries 0 and 10. At the leaf level of the 
table, a block can record up to two recent access times. Because we cannot afford record exact access 
times for each block, we let the system maintain a clock. The clock ticks when a block on disk is accessed. 
Assume the current clock time is 7. This block has only one timestamp, which is 1. When this block is 
accessed, it takes the current clock time as its most recent timestamp and records it in its corresponding 
table entry. If the entry is full, the oldest timestamp is replaced. We also record the most recent timestamp at 
the directory level of the table. So timestamp 7 is also recorded in the entries at these two directory levels. 
Using the block table, we can build efficient algorithm for finding access sequences by comparing timestamps of 
neighboring leaf block entries. 
You might be concerned with the space cost for the table when more and more blocks are added in it. Actually we 
only need to keep the disk working set in the table, and the table supports efficient space reclamation. We 
know a entry at the a directory level records the largest timestamp among all those of the blocks in the 
directory. When memory pressure is high and the system needs to reclaim some memory held by the table, 
we can traverse the table with a threshold timestamp. When we see a directory entry whose timestamp is 
smaller than the threshold, all the entries under it are removed. In this way, the space overhead can be
2 
The disk block table is similar in structure to a multi-level page table in operating systems. Just as each process 
has a page table, each disk has a block table. 
While a process table is used for translating a page’s virtual address into its physical address, we use the block 
table to record and track the recent access times of a disk block through its LBN. In this illustrative example, 
the block table has 3 levels, each entry in a directory level corresponds to 512 entries in its next lower level. 
Then LBN 5140 is mapped to this entry at leaf level through directory entries 0 and 10. At the leaf level of the 
table, a block can record up to two recent access times. Because we cannot afford record exact access 
times for each block, we let the system maintain a clock. The clock ticks when a block on disk is accessed. 
Assume the current clock time is 7. This block has only one timestamp, which is 1. When this block is 
accessed, it takes the current clock time as its most recent timestamp and records it in its corresponding 
table entry. If the entry is full, the oldest timestamp is replaced. We also record the most recent timestamp at 
the directory level of the table. So timestamp 7 is also recorded in the entries at these two directory levels. 
Using the block table, we can build efficient algorithm for finding access sequences by comparing timestamps of 
neighboring leaf block entries. 
You might be concerned with the space cost for the table when more and more blocks are added in it. Actually we 
only need to keep the disk working set in the table, and the table supports efficient space reclamation. We 
know a entry at the a directory level records the largest timestamp among all those of the blocks in the 
directory. When memory pressure is high and the system needs to reclaim some memory held by the table, 
we can traverse the table with a threshold timestamp. When we see a directory entry whose timestamp is 
smaller than the threshold, all the entries under it are removed. In this way, the space overhead can be
2 
The disk block table is similar in structure to a multi-level page table in operating systems. Just as each process 
has a page table, each disk has a block table. 
While a process table is used for translating a page’s virtual address into its physical address, we use the block 
table to record and track the recent access times of a disk block through its LBN. In this illustrative example, 
the block table has 3 levels, each entry in a directory level corresponds to 512 entries in its next lower level. 
Then LBN 5140 is mapped to this entry at leaf level through directory entries 0 and 10. At the leaf level of the 
table, a block can record up to two recent access times. Because we cannot afford record exact access 
times for each block, we let the system maintain a clock. The clock ticks when a block on disk is accessed. 
Assume the current clock time is 7. This block has only one timestamp, which is 1. When this block is 
accessed, it takes the current clock time as its most recent timestamp and records it in its corresponding 
table entry. If the entry is full, the oldest timestamp is replaced. We also record the most recent timestamp at 
the directory level of the table. So timestamp 7 is also recorded in the entries at these two directory levels. 
Using the block table, we can build efficient algorithm for finding access sequences by comparing timestamps of 
neighboring leaf block entries. 
You might be concerned with the space cost for the table when more and more blocks are added in it. Actually we 
only need to keep the disk working set in the table, and the table supports efficient space reclamation. We 
know a entry at the a directory level records the largest timestamp among all those of the blocks in the 
directory. When memory pressure is high and the system needs to reclaim some memory held by the table, 
we can traverse the table with a threshold timestamp. When we see a directory entry whose timestamp is 
smaller than the threshold, all the entries under it are removed. In this way, the space overhead can be
Because LRU or its variants are the most widely used replacement 
algorithms, we build the DULO scheme by using the LRU algorithm and its 
data structure -- the LRU stack, as a reference point. 
There are two key DULO operations: one is sequence forming. 
Sequence is defined a number of blocks whose disk
Because LRU or its variants are the most widely used replacement 
algorithms, we build the DULO scheme by using the LRU algorithm and its 
data structure -- the LRU stack, as a reference point. 
There are two key DULO operations: one is sequence forming. 
Sequence is defined a number of blocks whose disk
0.5 
This is the outline for the rest of the talk. First I will present my proposed 
scheme that uses disk layout information in buffer cache to improve disk 
performance. I will show the inadequacy of current buffer cache 
management in an OS. After describing how to efficiently managing disk 
layout information, I’ll present my proposed History-based prefetching and 
Miss-penalty aware caching, followed by a Performance evaluation of the 
scheme in a Linux kernel implementation. 
Next I will briefly introduce my proposed schemes on the coordination of 
distributed caches to reduce I/O requests, including Coordination of multi-level 
caches in a hierarchy and cooperative management of caches in peer-clients.
Now let me present my future research plan.

More Related Content

What's hot

Dbm 438 Enthusiastic Study / snaptutorial.com
Dbm 438 Enthusiastic Study / snaptutorial.comDbm 438 Enthusiastic Study / snaptutorial.com
Dbm 438 Enthusiastic Study / snaptutorial.comStephenson23
 
Ch12 OS
Ch12 OSCh12 OS
Ch12 OSC.U
 
Big table
Big tableBig table
Big tablePSIT
 
Ch9 OS
Ch9 OSCh9 OS
Ch9 OSC.U
 
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OS
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OSPractical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OS
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OSCuneyt Goksu
 
Corbett osdi12 slides (1)
Corbett osdi12 slides (1)Corbett osdi12 slides (1)
Corbett osdi12 slides (1)Aksh54
 
Bab 4
Bab 4Bab 4
Bab 4n k
 
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma
 
Google Bigtable Paper Presentation
Google Bigtable Paper PresentationGoogle Bigtable Paper Presentation
Google Bigtable Paper Presentationvanjakom
 

What's hot (15)

Chapter 9 names
Chapter 9 namesChapter 9 names
Chapter 9 names
 
Memory Management
Memory ManagementMemory Management
Memory Management
 
Dbm 438 Enthusiastic Study / snaptutorial.com
Dbm 438 Enthusiastic Study / snaptutorial.comDbm 438 Enthusiastic Study / snaptutorial.com
Dbm 438 Enthusiastic Study / snaptutorial.com
 
Ch12 OS
Ch12 OSCh12 OS
Ch12 OS
 
Main Memory
Main MemoryMain Memory
Main Memory
 
No sql
No sqlNo sql
No sql
 
Big table
Big tableBig table
Big table
 
Ch9 OS
Ch9 OSCh9 OS
Ch9 OS
 
Updates
UpdatesUpdates
Updates
 
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OS
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OSPractical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OS
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OS
 
Corbett osdi12 slides (1)
Corbett osdi12 slides (1)Corbett osdi12 slides (1)
Corbett osdi12 slides (1)
 
Bab 4
Bab 4Bab 4
Bab 4
 
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
 
Memory management
Memory managementMemory management
Memory management
 
Google Bigtable Paper Presentation
Google Bigtable Paper PresentationGoogle Bigtable Paper Presentation
Google Bigtable Paper Presentation
 

Viewers also liked

10 strumenti di cooperazione lorena matteo (2)
10 strumenti di cooperazione lorena matteo (2)10 strumenti di cooperazione lorena matteo (2)
10 strumenti di cooperazione lorena matteo (2)Seminari Europalab
 
Calcioscommesse: il documento integrale dell'accusa
Calcioscommesse: il documento integrale dell'accusaCalcioscommesse: il documento integrale dell'accusa
Calcioscommesse: il documento integrale dell'accusaThe Spilimbergo POST
 
Final de matematicas ii 2012
Final de matematicas  ii 2012Final de matematicas  ii 2012
Final de matematicas ii 2012Luis Leon
 
Affetti, storie locali, tradizioni: sono poche le esperienze di tecnologie ap...
Affetti, storie locali, tradizioni: sono poche le esperienze di tecnologie ap...Affetti, storie locali, tradizioni: sono poche le esperienze di tecnologie ap...
Affetti, storie locali, tradizioni: sono poche le esperienze di tecnologie ap...SMAU
 
"Prove di un mondo diverso" di Guido viale - assaggio -
"Prove di un mondo diverso" di Guido viale - assaggio -"Prove di un mondo diverso" di Guido viale - assaggio -
"Prove di un mondo diverso" di Guido viale - assaggio -Giuseppe Epifania
 
Touch for autism teleriabilitazione
Touch for autism teleriabilitazioneTouch for autism teleriabilitazione
Touch for autism teleriabilitazioneMarta Montoro
 
Invest in Europe: Piano Juncker per gli investimenti in Europa (infografica IT)
Invest in Europe: Piano Juncker per gli investimenti in Europa (infografica IT)Invest in Europe: Piano Juncker per gli investimenti in Europa (infografica IT)
Invest in Europe: Piano Juncker per gli investimenti in Europa (infografica IT)Parma Couture
 
Contaminacionporriudo 091025170745-phpapp01
Contaminacionporriudo 091025170745-phpapp01Contaminacionporriudo 091025170745-phpapp01
Contaminacionporriudo 091025170745-phpapp01guilledp1002
 
Actualidades pediatria 2013
Actualidades pediatria 2013Actualidades pediatria 2013
Actualidades pediatria 2013CesfamLoFranco
 
Cereali a colazione, muesli, granola e barrette
Cereali a colazione, muesli, granola e barretteCereali a colazione, muesli, granola e barrette
Cereali a colazione, muesli, granola e barretteGiEffebis Gina
 
Mini manual gmail
Mini manual gmailMini manual gmail
Mini manual gmailXxRUIZxX
 

Viewers also liked (17)

Svegli..amo
Svegli..amoSvegli..amo
Svegli..amo
 
10 strumenti di cooperazione lorena matteo (2)
10 strumenti di cooperazione lorena matteo (2)10 strumenti di cooperazione lorena matteo (2)
10 strumenti di cooperazione lorena matteo (2)
 
La e.c.s.
La e.c.s.La e.c.s.
La e.c.s.
 
Calcioscommesse: il documento integrale dell'accusa
Calcioscommesse: il documento integrale dell'accusaCalcioscommesse: il documento integrale dell'accusa
Calcioscommesse: il documento integrale dell'accusa
 
Final de matematicas ii 2012
Final de matematicas  ii 2012Final de matematicas  ii 2012
Final de matematicas ii 2012
 
Affetti, storie locali, tradizioni: sono poche le esperienze di tecnologie ap...
Affetti, storie locali, tradizioni: sono poche le esperienze di tecnologie ap...Affetti, storie locali, tradizioni: sono poche le esperienze di tecnologie ap...
Affetti, storie locali, tradizioni: sono poche le esperienze di tecnologie ap...
 
La luccicattiera
La luccicattieraLa luccicattiera
La luccicattiera
 
"Prove di un mondo diverso" di Guido viale - assaggio -
"Prove di un mondo diverso" di Guido viale - assaggio -"Prove di un mondo diverso" di Guido viale - assaggio -
"Prove di un mondo diverso" di Guido viale - assaggio -
 
18
1818
18
 
Touch for autism teleriabilitazione
Touch for autism teleriabilitazioneTouch for autism teleriabilitazione
Touch for autism teleriabilitazione
 
Invest in Europe: Piano Juncker per gli investimenti in Europa (infografica IT)
Invest in Europe: Piano Juncker per gli investimenti in Europa (infografica IT)Invest in Europe: Piano Juncker per gli investimenti in Europa (infografica IT)
Invest in Europe: Piano Juncker per gli investimenti in Europa (infografica IT)
 
Mishmash p4 7
Mishmash p4 7Mishmash p4 7
Mishmash p4 7
 
Contaminacionporriudo 091025170745-phpapp01
Contaminacionporriudo 091025170745-phpapp01Contaminacionporriudo 091025170745-phpapp01
Contaminacionporriudo 091025170745-phpapp01
 
Actualidades pediatria 2013
Actualidades pediatria 2013Actualidades pediatria 2013
Actualidades pediatria 2013
 
Cereali a colazione, muesli, granola e barrette
Cereali a colazione, muesli, granola e barretteCereali a colazione, muesli, granola e barrette
Cereali a colazione, muesli, granola e barrette
 
Mini manual gmail
Mini manual gmailMini manual gmail
Mini manual gmail
 
Fusões & aquisições relatório final
Fusões & aquisições relatório finalFusões & aquisições relatório final
Fusões & aquisições relatório final
 

Similar to Dulo: an effective buffer cache management scheme to exploit both temporal and spatial localities

File Management in Operating Systems
File Management in Operating SystemsFile Management in Operating Systems
File Management in Operating Systemsvampugani
 
Chapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationChapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationJafar Nesargi
 
Chapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationChapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationJafar Nesargi
 
Record storage and primary file organization
Record storage and primary file organizationRecord storage and primary file organization
Record storage and primary file organizationJafar Nesargi
 
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER  WR...AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER  WR...
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...Vijay Prime
 
Deep semantic understanding
Deep semantic understandingDeep semantic understanding
Deep semantic understandingsidra ali
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperDavid Walker
 
Storage for next-generation sequencing
Storage for next-generation sequencingStorage for next-generation sequencing
Storage for next-generation sequencingGuy Coates
 
The Fundamental Characteristics of Storage concepts for DBAs
The Fundamental Characteristics of Storage concepts for DBAsThe Fundamental Characteristics of Storage concepts for DBAs
The Fundamental Characteristics of Storage concepts for DBAsAlireza Kamrani
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementJ Singh
 
17-NoSQL.pptx
17-NoSQL.pptx17-NoSQL.pptx
17-NoSQL.pptxlevichan1
 

Similar to Dulo: an effective buffer cache management scheme to exploit both temporal and spatial localities (20)

File Management in Operating Systems
File Management in Operating SystemsFile Management in Operating Systems
File Management in Operating Systems
 
Chapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationChapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organization
 
Chapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationChapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organization
 
Chapter13
Chapter13Chapter13
Chapter13
 
Record storage and primary file organization
Record storage and primary file organizationRecord storage and primary file organization
Record storage and primary file organization
 
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER  WR...AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER  WR...
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...
 
Deep semantic understanding
Deep semantic understandingDeep semantic understanding
Deep semantic understanding
 
PowerAlluxio
PowerAlluxioPowerAlluxio
PowerAlluxio
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
 
4 026
4 0264 026
4 026
 
Operating system
Operating systemOperating system
Operating system
 
os
osos
os
 
Storage for next-generation sequencing
Storage for next-generation sequencingStorage for next-generation sequencing
Storage for next-generation sequencing
 
The Fundamental Characteristics of Storage concepts for DBAs
The Fundamental Characteristics of Storage concepts for DBAsThe Fundamental Characteristics of Storage concepts for DBAs
The Fundamental Characteristics of Storage concepts for DBAs
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage Management
 
ch11
ch11ch11
ch11
 
17-NoSQL.pptx
17-NoSQL.pptx17-NoSQL.pptx
17-NoSQL.pptx
 
OSCh12
OSCh12OSCh12
OSCh12
 
OS_Ch12
OS_Ch12OS_Ch12
OS_Ch12
 
File system implementation
File system implementationFile system implementation
File system implementation
 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Dulo: an effective buffer cache management scheme to exploit both temporal and spatial localities

  • 1. 2.5 As we know, disk I/O performance is critical to data-intensive applications. This is because these applications demand efficient I/O support. For example, database systems can manage millions of records on a storage device and access them in either small or large pieces, which requires low access time or high transfer rate of storage devices. For multimedia applications, they often access large blocks of data in a predictable sequence and demand a guaranteed minimal transfer rate. For scientific applications, I/O can be a big challenge because a huge amount of data can be requested in very-large-scale parallel systems within a short time. At Los Alamos national lab, ASCI mission-oriented programs conducts large-scale simulation-based analysis, which requires several gigabytes per second I/O bandwidth to support physical simulation and visualization. Their access
  • 2. 0.5 The performance of the hard disk is limited by mechanical constraints. To read or write a disk block, disk head has to on the right disk track through seeking and on the right sector through disk platter rotation. From the graph I just showed, you can see how slow disk seek--movement of disk arm--is. That is, disk arm is the Achilles' heel of disk access performance. If you access disk sequentially, you will minimize disk seeks and make full use of disk rotations. So access of sequential blocks is faster than access of randomly placed blocks by at least an order of magnitude.
  • 3. 0.5 This is the outline for the rest of the talk. First I will present my proposed scheme that uses disk layout information in buffer cache to improve disk performance. I will show the inadequacy of current buffer cache management in an OS. After describing how to efficiently managing disk layout information, I’ll present my proposed History-based prefetching and Miss-penalty aware caching, followed by a Performance evaluation of the scheme in a Linux kernel implementation. Next I will briefly introduce my proposed schemes on the coordination of distributed caches to reduce I/O requests, including Coordination of multi-level caches in a hierarchy and cooperative management of caches in peer-clients.
  • 4. 2.5 As we know, disk I/O performance is critical to data-intensive applications. This is because these applications demand efficient I/O support. For example, database systems can manage millions of records on a storage device and access them in either small or large pieces, which requires low access time or high transfer rate of storage devices. For multimedia applications, they often access large blocks of data in a predictable sequence and demand a guaranteed minimal transfer rate. For scientific applications, I/O can be a big challenge because a huge amount of data can be requested in very-large-scale parallel systems within a short time. At Los Alamos national lab, ASCI mission-oriented programs conducts large-scale simulation-based analysis, which requires several gigabytes per second I/O bandwidth to support physical simulation and visualization. Their access
  • 5. 2.5 As we know, disk I/O performance is critical to data-intensive applications. This is because these applications demand efficient I/O support. For example, database systems can manage millions of records on a storage device and access them in either small or large pieces, which requires low access time or high transfer rate of storage devices. For multimedia applications, they often access large blocks of data in a predictable sequence and demand a guaranteed minimal transfer rate. For scientific applications, I/O can be a big challenge because a huge amount of data can be requested in very-large-scale parallel systems within a short time. At Los Alamos national lab, ASCI mission-oriented programs conducts large-scale simulation-based analysis, which requires several gigabytes per second I/O bandwidth to support physical simulation and visualization. Their access
  • 6. 1.75 To utilize disk layout information for buffer cache management, we need to answer two questions before we design a new cache management scheme. The first question is which disk layout information to use. The second question is how to efficiently manage the disk layout information. The layout information that is interesting to us is the one that can help locate sequentially accessed blocks. We use Logical block number (LBN) that describes logical disk geometry provided by disk firmware. This is because disk manufacturers have made every effort to ensure accessing of continuous LBNs has a performance close to that of contiguous disk blocks. Another advantage of using LBN is that this interface is easily available and highly portable across different platforms. For the second question we know currently LBN is only used to identify disk locations for
  • 7. Then the sequence, X1..X4, are requested. The first block is on-demand fetched, and the following blocks are prefetched quickly without disk seeking. So blocks X2, X3, and X4 are hits. We assume that blocks are replaced from the bottom. LRU always put recently accessed blocks at the top. However, because random blocks are more expensive in their
  • 8. Then the Y sequence is accessed through prefetching. You can see in LRU the random blocks are replaced, while in the dual locality policy these blocks are retained in the blocks.
  • 9. Then the Y sequence is accessed through prefetching. You can see in LRU the random blocks are replaced, while in the dual locality policy these blocks are retained in the blocks.
  • 10. Now the X sequence is requested. All the blocks in the request are hits for LRU. Unfortunately, the dual locality policy just replaced them. But the good thing is that these sequential blocks are cheap to re-load by another prefetching.
  • 11. This time we want to re-access the random blocks. They are hit in dual locality policy. However, the LRU policy has to take four time-consuming disk rotation and seeking times to reload them. By considering the different access costs between sequential and random Blocks, dual locality policy makes a big performance difference: reduce disk
  • 12. Because LRU or its variants are the most widely used replacement algorithms, we build the DULO scheme by using the LRU algorithm and its data structure -- the LRU stack, as a reference point. There are two key DULO operations: one is sequence forming. Sequence is defined a number of blocks whose disk
  • 13. 2 The disk block table is similar in structure to a multi-level page table in operating systems. Just as each process has a page table, each disk has a block table. While a process table is used for translating a page’s virtual address into its physical address, we use the block table to record and track the recent access times of a disk block through its LBN. In this illustrative example, the block table has 3 levels, each entry in a directory level corresponds to 512 entries in its next lower level. Then LBN 5140 is mapped to this entry at leaf level through directory entries 0 and 10. At the leaf level of the table, a block can record up to two recent access times. Because we cannot afford record exact access times for each block, we let the system maintain a clock. The clock ticks when a block on disk is accessed. Assume the current clock time is 7. This block has only one timestamp, which is 1. When this block is accessed, it takes the current clock time as its most recent timestamp and records it in its corresponding table entry. If the entry is full, the oldest timestamp is replaced. We also record the most recent timestamp at the directory level of the table. So timestamp 7 is also recorded in the entries at these two directory levels. Using the block table, we can build efficient algorithm for finding access sequences by comparing timestamps of neighboring leaf block entries. You might be concerned with the space cost for the table when more and more blocks are added in it. Actually we only need to keep the disk working set in the table, and the table supports efficient space reclamation. We know a entry at the a directory level records the largest timestamp among all those of the blocks in the directory. When memory pressure is high and the system needs to reclaim some memory held by the table, we can traverse the table with a threshold timestamp. When we see a directory entry whose timestamp is smaller than the threshold, all the entries under it are removed. In this way, the space overhead can be
  • 14. 2 The disk block table is similar in structure to a multi-level page table in operating systems. Just as each process has a page table, each disk has a block table. While a process table is used for translating a page’s virtual address into its physical address, we use the block table to record and track the recent access times of a disk block through its LBN. In this illustrative example, the block table has 3 levels, each entry in a directory level corresponds to 512 entries in its next lower level. Then LBN 5140 is mapped to this entry at leaf level through directory entries 0 and 10. At the leaf level of the table, a block can record up to two recent access times. Because we cannot afford record exact access times for each block, we let the system maintain a clock. The clock ticks when a block on disk is accessed. Assume the current clock time is 7. This block has only one timestamp, which is 1. When this block is accessed, it takes the current clock time as its most recent timestamp and records it in its corresponding table entry. If the entry is full, the oldest timestamp is replaced. We also record the most recent timestamp at the directory level of the table. So timestamp 7 is also recorded in the entries at these two directory levels. Using the block table, we can build efficient algorithm for finding access sequences by comparing timestamps of neighboring leaf block entries. You might be concerned with the space cost for the table when more and more blocks are added in it. Actually we only need to keep the disk working set in the table, and the table supports efficient space reclamation. We know a entry at the a directory level records the largest timestamp among all those of the blocks in the directory. When memory pressure is high and the system needs to reclaim some memory held by the table, we can traverse the table with a threshold timestamp. When we see a directory entry whose timestamp is smaller than the threshold, all the entries under it are removed. In this way, the space overhead can be
  • 15. 2 The disk block table is similar in structure to a multi-level page table in operating systems. Just as each process has a page table, each disk has a block table. While a process table is used for translating a page’s virtual address into its physical address, we use the block table to record and track the recent access times of a disk block through its LBN. In this illustrative example, the block table has 3 levels, each entry in a directory level corresponds to 512 entries in its next lower level. Then LBN 5140 is mapped to this entry at leaf level through directory entries 0 and 10. At the leaf level of the table, a block can record up to two recent access times. Because we cannot afford record exact access times for each block, we let the system maintain a clock. The clock ticks when a block on disk is accessed. Assume the current clock time is 7. This block has only one timestamp, which is 1. When this block is accessed, it takes the current clock time as its most recent timestamp and records it in its corresponding table entry. If the entry is full, the oldest timestamp is replaced. We also record the most recent timestamp at the directory level of the table. So timestamp 7 is also recorded in the entries at these two directory levels. Using the block table, we can build efficient algorithm for finding access sequences by comparing timestamps of neighboring leaf block entries. You might be concerned with the space cost for the table when more and more blocks are added in it. Actually we only need to keep the disk working set in the table, and the table supports efficient space reclamation. We know a entry at the a directory level records the largest timestamp among all those of the blocks in the directory. When memory pressure is high and the system needs to reclaim some memory held by the table, we can traverse the table with a threshold timestamp. When we see a directory entry whose timestamp is smaller than the threshold, all the entries under it are removed. In this way, the space overhead can be
  • 16. 2 The disk block table is similar in structure to a multi-level page table in operating systems. Just as each process has a page table, each disk has a block table. While a process table is used for translating a page’s virtual address into its physical address, we use the block table to record and track the recent access times of a disk block through its LBN. In this illustrative example, the block table has 3 levels, each entry in a directory level corresponds to 512 entries in its next lower level. Then LBN 5140 is mapped to this entry at leaf level through directory entries 0 and 10. At the leaf level of the table, a block can record up to two recent access times. Because we cannot afford record exact access times for each block, we let the system maintain a clock. The clock ticks when a block on disk is accessed. Assume the current clock time is 7. This block has only one timestamp, which is 1. When this block is accessed, it takes the current clock time as its most recent timestamp and records it in its corresponding table entry. If the entry is full, the oldest timestamp is replaced. We also record the most recent timestamp at the directory level of the table. So timestamp 7 is also recorded in the entries at these two directory levels. Using the block table, we can build efficient algorithm for finding access sequences by comparing timestamps of neighboring leaf block entries. You might be concerned with the space cost for the table when more and more blocks are added in it. Actually we only need to keep the disk working set in the table, and the table supports efficient space reclamation. We know a entry at the a directory level records the largest timestamp among all those of the blocks in the directory. When memory pressure is high and the system needs to reclaim some memory held by the table, we can traverse the table with a threshold timestamp. When we see a directory entry whose timestamp is smaller than the threshold, all the entries under it are removed. In this way, the space overhead can be
  • 17. Because LRU or its variants are the most widely used replacement algorithms, we build the DULO scheme by using the LRU algorithm and its data structure -- the LRU stack, as a reference point. There are two key DULO operations: one is sequence forming. Sequence is defined a number of blocks whose disk
  • 18. Because LRU or its variants are the most widely used replacement algorithms, we build the DULO scheme by using the LRU algorithm and its data structure -- the LRU stack, as a reference point. There are two key DULO operations: one is sequence forming. Sequence is defined a number of blocks whose disk
  • 19. 0.5 This is the outline for the rest of the talk. First I will present my proposed scheme that uses disk layout information in buffer cache to improve disk performance. I will show the inadequacy of current buffer cache management in an OS. After describing how to efficiently managing disk layout information, I’ll present my proposed History-based prefetching and Miss-penalty aware caching, followed by a Performance evaluation of the scheme in a Linux kernel implementation. Next I will briefly introduce my proposed schemes on the coordination of distributed caches to reduce I/O requests, including Coordination of multi-level caches in a hierarchy and cooperative management of caches in peer-clients.
  • 20. Now let me present my future research plan.