SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
It Takes Two: Instrumenting the Interaction between In-Memory Databases and Solid-State Drives CIDR 2020 presentation
It Takes Two: Instrumenting the Interaction between In-Memory Databases and Solid-State Drives CIDR 2020 presentation
1.
It Takes Two: Instrumenting the
Interaction between In-Memory
Databases and Solid-State Drives
Alberto Lerner1 Jaewook Kwak2 Sangjin Lee2 Kibin Park2
Yong Ho Song2,3 Philippe Cudré-Mauroux1
1 XI Lab – University of Fribourg, Switzerland
2 ENC Lab – Hanyang University, Korea
3 Samsung Electronics, Korea
CIDR – January 2020 - Amsterdam
2.
Motivation
• Where is time going?
• CPU/cache utilization
-> HW performance counters
• Per-instruction cost
-> pprof, linux perf tool
• Operating System impact
-> systemtap, several others
• SSD performance
-> ?
2
3.
Challenges in In-Memory Databases Durability
• Log needs to be written as fast as
possible
• Checkpoint competes with client
request for memory and disk
access
• Can we understand the
interference? Was the TX Log IO
pattern efficient to begin with?
¼
Users Txn’s CP workers
3
host
storage
Txn
Log
Check
point
4.
Cosmos+ OpenSSD
• Idea: let’s instrument an actual
device!
• SSD rapid prototyping platform
• SoC-based
• Fully functional
• Open source firmware
• Next generation is on final stages
of development
4
8.
Instrumentation
• Timestamping (in red)
• Counters (in green)
• Pagemap
• Mechanisms
• Triggers
• Data extraction
commands
8
9.
Performance Event Records (PEV)
• Currently four types of records
IO_TIMESTAMP Regular timestamp stations
GC_TIMESTAMP FTL timestamp stations
PERFORMANCE_INDEX Aggregated counter
PERFORMANCE_INDEX_PER_CH Per channel counters
9
13.
Research Agenda I - Instrumentation
• Functionality Limitations
• Currently limited at 4 channels
• Further annotations to trace back
valid copies
• Contextual triggers
• Signal Generation
• Process instrumentation records
on-the-fly
• Identify scenarios where a
scheduling policy change is
beneficial
13
14.
Research Agenda II – SSD as a Platform
• Adaptive Scheduling
• Respond instantaneously to
signals generated by changing
priorities
• In-Storage Checkpoint
”Derivation”
• Move the checkpoint process
partially or entirely into the device
14
15.
Conclusion
• SSDs don’t have to be black boxes
• The Instrumented Cosmos+ allows designers of both Databases and FTLs to
analyze and understand interference in workloads
• Opportunities to
• Have SSDs interact with applications in richer ways
• Exploit new possibilities of Near-Data Computing for Databases
15