It Takes Two: Instrumenting the Interaction between In-Memory Databases and Solid-State Drives CIDR 2020 presentation
1. It Takes Two: Instrumenting the
Interaction between In-Memory
Databases and Solid-State Drives
Alberto Lerner1 Jaewook Kwak2 Sangjin Lee2 Kibin Park2
Yong Ho Song2,3 Philippe Cudré-Mauroux1
1 XI Lab – University of Fribourg, Switzerland
2 ENC Lab – Hanyang University, Korea
3 Samsung Electronics, Korea
CIDR – January 2020 - Amsterdam
2. Motivation
• Where is time going?
• CPU/cache utilization
-> HW performance counters
• Per-instruction cost
-> pprof, linux perf tool
• Operating System impact
-> systemtap, several others
• SSD performance
-> ?
2
3. Challenges in In-Memory Databases Durability
• Log needs to be written as fast as
possible
• Checkpoint competes with client
request for memory and disk
access
• Can we understand the
interference? Was the TX Log IO
pattern efficient to begin with?
¼
Users Txn’s CP workers
3
host
storage
Txn
Log
Check
point
4. Cosmos+ OpenSSD
• Idea: let’s instrument an actual
device!
• SSD rapid prototyping platform
• SoC-based
• Fully functional
• Open source firmware
• Next generation is on final stages
of development
4
9. Performance Event Records (PEV)
• Currently four types of records
IO_TIMESTAMP Regular timestamp stations
GC_TIMESTAMP FTL timestamp stations
PERFORMANCE_INDEX Aggregated counter
PERFORMANCE_INDEX_PER_CH Per channel counters
9
13. Research Agenda I - Instrumentation
• Functionality Limitations
• Currently limited at 4 channels
• Further annotations to trace back
valid copies
• Contextual triggers
• Signal Generation
• Process instrumentation records
on-the-fly
• Identify scenarios where a
scheduling policy change is
beneficial
13
14. Research Agenda II – SSD as a Platform
• Adaptive Scheduling
• Respond instantaneously to
signals generated by changing
priorities
• In-Storage Checkpoint
”Derivation”
• Move the checkpoint process
partially or entirely into the device
14
15. Conclusion
• SSDs don’t have to be black boxes
• The Instrumented Cosmos+ allows designers of both Databases and FTLs to
analyze and understand interference in workloads
• Opportunities to
• Have SSDs interact with applications in richer ways
• Exploit new possibilities of Near-Data Computing for Databases
15