Paper reading: HashKV and beyond

Paper Reading: HashKV and Beyond
Presented by Xinye Tao

Background: SSD
● Coarse-Grained Access
○ data is returned by Page
○ in-place update requires a costly read-erase-
rewrite procedure on Block
● Parallelism
○ unlike hard disk, SSD can serve multiple
requests at the same time
○ sequential or patterned random access can be
optimized
■ sequential access is faster due to
hardware-tuned address translation unit
what we need is
Append write & Prefetch read

Background: LSM and Key-Value Seperation
● LSM is a radical compromise in pursuit for online latency
● Cache + Log architecture, not Index + Data
● Pros:
○ Append-Only is SSD-Friendly
○ Immutablity is Parallel-Friendly
● Cons:
○ Need Traversal to Query Log, read amplification
○ Stale Data, space amplification (1 / fanout)
○ Data Purging needs Rewriting Cold Record, write amplification

Background: LSM and Key-Value Seperation
● Three amplifications are unbeatable (RUM conjecture)
● But we can delay its development by feeding less data
● One feasible solution is Key-Value Seperation

Problem Formalization: Designing Value Store
● Architecture
○ Index Store (key->address)
○ Value Store (address->value)
● Metrics
○ Read, Scan
○ Write
○ Space Amplification
○ Background Load and Variance
○ Update (on Index Store) Frequency

Previous Attempt: WiscKey
● Circular List
● Append at tail
● Purge at head
○ if IndexStore.get(log.key) != log.address: Purge
○ else: Relocate
writer

Previous Attempt: WiscKey
● Evaluation
○ GC_Speed >= Update_Speed
○ Update Frequency: GC_Speed * Cold_Rate
○ Background Load: GC_Speed * { Index Query + Cold_Rate * (Append + Index Update) }
● GC’s load variance is bound to Cold_Rate and Update_Speed
● Index Store’s locality is disturbed by unnecessary relocation of cold record

HashKV: Hash-Based Data Grouping
● Value Store is Hash-Partitioned, with each partition
dynamically sized
● Global GC: choose partition by priority
○ in-memory heap holds write frequency of each
partition
● Local GC: local scan without issuing new request to
Index Store
○ in-memory temporary hash map to track latest
key:value
Main
Segment 1
Segment
Table

HashKV: Hash-Based Data Grouping
● Value Store is Hash-Partitioned, with each partition
dynamically sized
● Global GC: choose partition by priority
○ in-memory heap holds write frequency of each
partition
● Local GC: local scan without issuing new request to
Index Store
○ in-memory temporary hash map to track latest
key:value
Main
Segment 1
2
3
...
Segment
Table
incremental segment
incremental segment
incremental segment
...

HashKV: Hotness-Awareness
● Rewriting cold entry with rare update poses a waste
of resource
● In HashKV, they are identified and moved to
seperate vLog during GC
○ Index Store will point to new value in vLog
○ old entry is tagged as cold
● Further update may trigger a cold-to-hot switch
○ vLog is GCed using WiscKey’s approach
meta(tagged), key
k-v

HashKV: Hotness-Awareness
● Rewriting cold entry with rare update poses a waste
of resource
● In HashKV, they are identified and moved to
seperate vLog during GC
○ Index Store will point to new value in vLog
○ old entry is tagged as cold
● Further update may trigger a cold-to-hot switch
○ vLog is GCed using WiscKey’s approach
meta(tagged), key
k-v
meta, key, new-value

Benchmark: Setup
● Hardware
○ CPU: 4 core Xeon E3-1240v2
○ Memory: 16 GB
○ Disk: 128 GB SSD * ( 1 + 6 )
● Input
○ Key: 24 Byte
○ Value: 992 Byte
○ Dataset: three phases of 40 GB updates over
existing 40 GB

Benchmark: Overall Performance

Benchmark: Impact of Value Size

Pros and Cons of HashKV
● Evaluation
○ Read, Scan: read randomness worse than WiscKey
■ could be optimized by software prefetch in the cost of CPU resource
■ could be optimized by range grouping, which needs global scheduler
● TiKV Region
○ Write: global Append flow is divided into regions, write randomness
■ could be optimized by batch write
■ in the cost of parallelism
○ Background Load: no checking query and external dependency on GC
○ Update Frequency: once for cold, several for hot

Insights from HashKV: modular
LSM
LSM
Index
vLog
LSM
Index
Hashed
Log
Cold
vLog
?
optimize for workflow

Insights from HashKV: tunable
● Tunable system is naturally adaptable
○ design decision is made at runtime, not fixed at setup
● HashKV is tunable in that
○ global GC can be delayed without compromising KV service
○ local GC can be scheduled in a tunable manner
○ cold/hot seperation is also controlled by independent standard
● e.g. Self-Driving database Peloton by CMU, now remade as terrier
optimize for workload

reference ( and some related papers )
ATC ‘18, HashKV: Enabling Efficient Updates in KV Storage via Hashing
EDBT ‘16, Designing Access Methods: The RUM Conjecture
FAST ‘16, WiscKey: Separating Keys from Values in SSD-conscious Storage
FAST ‘16, Towards Accurate and Fast Evaluation of Multi-Stage Log-structured Designs
ATC ‘17, TRIAD: Creating Synergies Between Memory, Disk and Log in Log Structured Key-Value Stores
CIDR ‘17, Optimizing Space Amplification in RocksDB
SOSP ‘17, PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees
ATC ‘18, Redesigning LSMs for Nonvolatile Memory with NoveLSM
VLDB ‘19, Efficient Data Ingestion and Query Processing for LSM-Based Storage Systems

Paper reading: HashKV and beyond

More Related Content

What's hot

Similar to Paper reading: HashKV and beyond

More from PingCAP

Recently uploaded

Paper reading: HashKV and beyond

Editor's Notes