The document describes a technique called "flashy prefetching" that aims to improve data prefetching performance for high-performance flash drives such as SSDs. It recognizes that traditional prefetching is not optimized for flash drives and their characteristics. Flashy prefetching dynamically controls prefetching aggressiveness based on drive performance, prefetching performance, and enabling prefetching for multiple accesses. An implementation called "Prefetchd" achieved on average 20% speedup on large file benchmarks and 65-70% prefetching accuracy.
3. INTRODUCTION
• NAND Flash memory
• SSDs are becoming increasingly popular due to high bandwidth
and low latency
• SSDs v/s HDDs
• Data Prefetching is used to reduce access latency ; because it
loads data from storage into main memory that data which is
likely to be read
• “Flashy” prefetching focuses on high performance SSDs (flash)
4. Traditional Prefetching
• Focuses on rotational hard drives and conservative in the
amount of data prefetched
• If prefetching is aggressive
• It can take up shared i/o bandwidth meant for applications
• Main memory might become filled with unneeded data while useful data
gets evicted
• Risky when HDD bandwidth and system RAM are limited
• Not tuned for different types of storage and applications. An
application may require data at high rate while SSDs can
support high prefetch rate
5. How flashy prefetching is different
• Takes advantage of high bandwidth and low latency of SSD
• Aware of the runtime environment
• Adaptive to the changing needs of application and device both
• Inherent support for parallel data accesses
• Aggressiveness is controlled through feedback mechanism
7. 1. SSDs are different
SLC –
• SLC uses a single cell to store one
bit of data
• SLC is faster and much more
reliable -but also more expensive
MLC –
• Multi level cell, can interpret four
digital states from a signal stored
in a single cell
• This makes it denser for a given
area and so cheaper to produce,
but it wears out fast
8. Applications are different
• Data intensive
applications may have
varying I/O
operations
• Every application goes
through multiple
stages of I/O and data
operations each
having different i/o
needs
9. Prefetching for SSDs and HDDs is different
• No seek latency in SSDs
• Inherent Support for
parallel access
• Sequential data can be
read faster on HDDs as
platter moves, not
necessary the case with
SSDs
Flashy prefetching gives good balance between speedup
and cost
11. To address the 3 challenges, the design principles are
as follows
1. Control prefetching based on drive performance
2. Control prefetching based on prefetching performance
3. Enable prefetching for multiple simultaneous accesses
14. Trace Collection
• Collect i/o event information
(timestamp, process id, process name, request type, amount of data, CPU
no. , starting block no. and block size)
• Trace application requests and also the requests which actually
reach the disk
15. Pattern Recognition
• Based on the idea of polling interval
Time
interval =
0.5s
prefetc
h
Time
interval =
0.5s
prefetc
h
Time
interval =
0.5s
Evaluate
trace data
Prefetch
request
Adjust
aggressiveness
Sleep
Wake up
• Only useful information is type of request, starting block no.,
no. of blocks and the process id
• Basic idea of this step is if an application makes read accesses
within certain time interval that follows a pattern, prefetching is
done by extrapolating the same pattern
17. Data Prefetching
• Done after recognizing a pattern
• Scale factor
• Maximum disk throughput varies for each disk
• Last known stop block is the start block for next prefetch
• Stop block is calculated using below parameters
• Total available bandwidth = Max disk throughput – Estimated application reads
reaching disk
• Linear throughput of application = % consecutive reads / total throughput
• Prefetch_throughput = scale x consec % x ( total blocks / timer interval )
18. Feedback Monitoring
• Evaluates and classifies only the read operations that reach disk
• If any predictable reads were not prefetched (reached the disk),
aggressiveness is increased
• If no. of linear reads reaching disk is 0 and no. of blocks
prefetched is more than what application requested, then
aggressiveness is decreased
• This phase utilizes information of previous polling interval.
Basically compares pattern of reads entering the cache vs
pattern of reads missing the cache
21. Prefetchd
• Prototype implemented in Linux system which runs in user
space
• Cache eviction policy is with kernel
• Event collection using blktrace API in linux,
• Readahead system call is used to load pages from a file into the
system page cache
• Current implementation uses a loopback device
SSD Main memory
System
cache
23. Setup
• CPU: Intel Core2Quad @ 2.33 Ghz wit 8 GB RAM
• Tested against 4 SSDs and 1 HDD
• SSDs formatted with ext2 file system with one large file and
connected with loopback device (formatted with ext3 file
system)
24. Benchmarks Used
• Database test suite 3 (decision support TPCH benchmark
queries )
• BLAST (Biological tool to compare biological sequences)
• LFS (Large file I/O)
• PostMark (small files but in large numbers)
• SlowThink (CPU intensive application)
26. Prefetching Speedup
• Values above 2.0 are omitted
• Best performance on LFS benchmarks
• In some cases, speedups for SSDs tend to be lower than HDDs because the potential
speedup for an already fast device is limited
27. Prefetching Accuracy
• Accuracy = Amount of data prefetched and subsequently read / total amount used
by application
• Avg accuracy of 60%
• Websearch benchmark contains large amount of random accesses
28. Prefetching Cost
Cost = Amount of data prefetched / Amount of data read by application
Cost < 1 means amount of data prefetched is less
29. SUMMARY
• Data prefetching with SSDs in mind
• Tunes as per drive and application characteristics
• Dynamically controls aggression with feedback
• “Prefetchd” is the prototype built over flashy prefetching
technique
• Achieves on an average 20% speedup on LFS and 65-70%
prefetching accuracy