Flashy prefetching for high performance flash drives

FLASHY PREFETCHING FOR
HIGH PERFORMANCE FLASH
DRIVES
Authors: AHSEN J. UPPAL, RON C. CHIANG AND H. HOWIE HUANG
Presented by : PRATIK BHAT

OUTLINE
•INTRODUCTION
• CHALLENGES
• DESIGN PINCIPLES
• ARCHITECTURE
• IMPLEMENTATION
• EVALUATION

INTRODUCTION
• NAND Flash memory
• SSDs are becoming increasingly popular due to high bandwidth
and low latency
• SSDs v/s HDDs
• Data Prefetching is used to reduce access latency ; because it
loads data from storage into main memory that data which is
likely to be read
• “Flashy” prefetching focuses on high performance SSDs (flash)

Traditional Prefetching
• Focuses on rotational hard drives and conservative in the
amount of data prefetched
• If prefetching is aggressive
• It can take up shared i/o bandwidth meant for applications
• Main memory might become filled with unneeded data while useful data
gets evicted
• Risky when HDD bandwidth and system RAM are limited
• Not tuned for different types of storage and applications. An
application may require data at high rate while SSDs can
support high prefetch rate

How flashy prefetching is different
• Takes advantage of high bandwidth and low latency of SSD
• Aware of the runtime environment
• Adaptive to the changing needs of application and device both
• Inherent support for parallel data accesses
• Aggressiveness is controlled through feedback mechanism

OUTLINE
• INTRODUCTION
•CHALLENGES
• ARCHITECTURE
• IMPLEMENTATION
• EVALUATION

1. SSDs are different
SLC –
• SLC uses a single cell to store one
bit of data
• SLC is faster and much more
reliable -but also more expensive
MLC –
• Multi level cell, can interpret four
digital states from a signal stored
in a single cell
• This makes it denser for a given
area and so cheaper to produce,
but it wears out fast

Applications are different
• Data intensive
applications may have
varying I/O
operations
• Every application goes
through multiple
stages of I/O and data
operations each
having different i/o
needs

Prefetching for SSDs and HDDs is different
• No seek latency in SSDs
• Inherent Support for
parallel access
• Sequential data can be
read faster on HDDs as
platter moves, not
necessary the case with
SSDs
Flashy prefetching gives good balance between speedup
and cost

OUTLINE
• INTRODUCTION
• CHALLENGES
•DESIGN PINCIPLES
• ARCHITECTURE
• IMPLEMENTATION
• EVALUATION

To address the 3 challenges, the design principles are
as follows
1. Control prefetching based on drive performance
2. Control prefetching based on prefetching performance
3. Enable prefetching for multiple simultaneous accesses

OUTLINE
• INTRODUCTION
• CHALLENGES
•ARCHITECTURE
• IMPLEMENTATION
• EVALUATION

Trace Collection
• Collect i/o event information
(timestamp, process id, process name, request type, amount of data, CPU
no. , starting block no. and block size)
• Trace application requests and also the requests which actually
reach the disk

Pattern Recognition
• Based on the idea of polling interval
Time
interval =
0.5s
prefetc
h
Time
interval =
0.5s
prefetc
h
Time
interval =
0.5s
Evaluate
trace data
Prefetch
request
Adjust
aggressiveness
Sleep
Wake up
• Only useful information is type of request, starting block no.,
no. of blocks and the process id
• Basic idea of this step is if an application makes read accesses
within certain time interval that follows a pattern, prefetching is
done by extrapolating the same pattern

Pattern Recognition …
• 4 patterns :
• Sequential forward/backward reads
• Strided forward/backward reads
• Prefetching request is generated if
• 푁표.표푓 푏푙표푐푘푠 푖푛 푎 푝푎푡푡푒푟푛 (푓푟표푚 푎푏표푣푒)
푇표푡푎푙 푛표.표푓 푏푙표푐푘푠 푟푒푎푑 푖푛 푡ℎ푒 푖푛푡푒푟푣푎푙
≥ 푝푎푡푡푒푟푛 푚푎푡푐ℎ 푡ℎ푟푒푠ℎ표푙푑

Data Prefetching
• Done after recognizing a pattern
• Scale factor
• Maximum disk throughput varies for each disk
• Last known stop block is the start block for next prefetch
• Stop block is calculated using below parameters
• Total available bandwidth = Max disk throughput – Estimated application reads
reaching disk
• Linear throughput of application = % consecutive reads / total throughput
• Prefetch_throughput = scale x consec % x ( total blocks / timer interval )

Feedback Monitoring
• Evaluates and classifies only the read operations that reach disk
• If any predictable reads were not prefetched (reached the disk),
aggressiveness is increased
• If no. of linear reads reaching disk is 0 and no. of blocks
prefetched is more than what application requested, then
aggressiveness is decreased
• This phase utilizes information of previous polling interval.
Basically compares pattern of reads entering the cache vs
pattern of reads missing the cache

OUTLINE
• INTRODUCTION
• CHALLENGES
• ARCHITECTURE
•IMPLEMENTATION
• EVALUATION

Prefetchd
• Prototype implemented in Linux system which runs in user
space
• Cache eviction policy is with kernel
• Event collection using blktrace API in linux,
• Readahead system call is used to load pages from a file into the
system page cache
• Current implementation uses a loopback device
SSD Main memory
System
cache

OUTLINE
• INTRODUCTION
• CHALLENGES
• ARCHITECTURE
• IMPLEMENTATION
•EVALUATION

Setup
• CPU: Intel Core2Quad @ 2.33 Ghz wit 8 GB RAM
• Tested against 4 SSDs and 1 HDD
• SSDs formatted with ext2 file system with one large file and
connected with loopback device (formatted with ext3 file
system)

Benchmarks Used
• Database test suite 3 (decision support TPCH benchmark
queries )
• BLAST (Biological tool to compare biological sequences)
• LFS (Large file I/O)
• PostMark (small files but in large numbers)
• SlowThink (CPU intensive application)

Prefetching Speedup
• Values above 2.0 are omitted
• Best performance on LFS benchmarks
• In some cases, speedups for SSDs tend to be lower than HDDs because the potential
speedup for an already fast device is limited

Prefetching Accuracy
• Accuracy = Amount of data prefetched and subsequently read / total amount used
by application
• Avg accuracy of 60%
• Websearch benchmark contains large amount of random accesses

Prefetching Cost
Cost = Amount of data prefetched / Amount of data read by application
Cost < 1 means amount of data prefetched is less

SUMMARY
• Data prefetching with SSDs in mind
• Tunes as per drive and application characteristics
• Dynamically controls aggression with feedback
• “Prefetchd” is the prototype built over flashy prefetching
technique
• Achieves on an average 20% speedup on LFS and 65-70%
prefetching accuracy

Flashy prefetching for high performance flash drives

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Flashy prefetching for high performance flash drives

Similar to Flashy prefetching for high performance flash drives (20)

Recently uploaded

Recently uploaded (20)

Flashy prefetching for high performance flash drives