The document discusses research into data management, file systems, and storage systems being conducted at UC Santa Cruz. Specific projects mentioned include using Ceph as a prototyping platform, the SIRIUS project studying challenges of heterogeneous, multi-tiered storage for exascale systems, the Programmable Storage project developing the Malacology and Mantle systems, and the Skyhook project to build an elastic database system that leverages programmable storage interfaces. The research aims to address issues like data placement, predictable performance at scale, and allowing databases to better utilize storage resources.
2. 2
● Graduate student
● UC Santa Cruz
● Data management,
file systems, HPC,
QoS
● Ceph as a
prototyping
platform
Storage research at UCSC
● Data management
● Storage systems
● High-performance computing
● Quality of service
● Real-time systems
4. 4
● Graduate student
● UC Santa Cruz
● Data management,
file systems, HPC,
QoS
● Ceph as a
prototyping
platform
Storage research at UCSC
● SIRIUS Project (DOE)
● Programmable storage (CROSS, NSF)
● Declarative storage (NSF)
● IRIS-HEP (NSF)
5. SIRIUS: Science-driven Data Management for Multi-tiered
Storage (ORNL, Sandia, Brown, Rutgers, UCSC)
● Storage challenges for
exascale systems
● (1) Heterogeneity
○ Where should data be
stored?
● (2) Predictable
performance
○ Millions of processes
performing I/O
● Many challenges...
DOE SSIO, “Science-Driven Data
Management for Multi-Tiered Storage”
with ORNL and Sandia, award
DE-SC0016074
6. Malacology
A Programmable Storage System
[Sevilla et al. EuroSys '17]
Michael A. Sevilla, Noah Watkins, Ivo Jimenez,
Peter Alvaro, Shel Finkelstein, Jeff LeFevre, Carlos Maltzahn
University of California, Santa Cruz
7. Malacology: programmable interface research platform
7
Target storage interface
(Goal)
Internal sys services
(Building blocks)
Composed, generic
service glue layer
Malacology: A Programmable Storage System, M. Sevilla, N. Watkins, I. Jimenez, P. Alvaro, S. Finkelstein, J. LeFevre, C. Maltzahn, EuroSys ‘17
Mantle: A Programmable Metadata Load Balancer for the Ceph File System, M. Sevilla, N. Watkins, C. Maltzahn, I. Nassi, S. Brandt, et. al, SC ‘15
CORFU: A Shared Log Design for Flash Clusters, Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, and Ted Wobber, Michael Wei, et. al, NSDI ‘12
8. How to grow a database: scale-up
Database Node
CPURAM
Database Storage
Network /
Bus
Q
8
https://aws.amazon.com/ec2/instance-types/
Skyhook project
● Elastic database system
● Lead: Jeff LeFevre
● Active CROSS incubator
10. DB-specific Data Interface
Ceph OSD
RAM CPU
Storage+Index
Q
Skyhook: aligns data with storage interfaces
Ceph OSD
RAM CPU
Storage+Index
Ceph OSD
RAM CPU
Storage+Index
Ceph OSD
RAM CPU
Storage+Index
Ceph OSD
RAM CPU
Storage+Index
Ceph OSD
RAM CPU
Storage+Index
Ceph OSD
RAM CPU
Storage+Index
Ceph Cluster
C1 C2 C3
Table
Table
Shards
partitioning
{ object.i }
{ object.i }
10
Database Node
RAM CPU
Foreign Data Wrappers
Q
Q
Database node
* Indexing
* Projection
* Filtering
* Aggregation
App-specific
interface
11. Skyhook experiments with programmable storage
● Real-world dataset
○ TPC lineitem table
○ 1 billion rows
○ 140 GB
● Storage in Ceph objects
○ Table divided into ~10,000 14 MB objects
■ Optimize for workload (e.g. 4MB)
○ Each object contains a dedicated index
■ Index stored in omap (RocksDB)
● Storage hardware (thanks CloudLab!)
○ Modern 20 core Intel
○ 128 GB DRAM, 500 GB SSD
○ 10 GB/s Ethernet
○ 1 -- 16 Ceph nodes
Database Node
CPURAM
Programmable storage
Network
11
(Database-specific data interface)
Q
Q
Q
Q
12. Benchmark queries evaluated
Qa: Range query with 10% selectivity:
SELECT * FROM lineitem WHERE extendedprice > 71000.0
Qb: Point query (unique row) issued with and without index:
SELECT extendedprice
FROM lineitem
WHERE orderkey=5 AND linenumber=3
Qc: Regex query with 10% selectivity (CPU intensive):
SELECT * FROM lineitem WHERE comment iLIKE '%uriously%'
12
13. Range query performance (10% selectivity)
13
Improved I/O performance
● Local I/O bandwidth
● Local CPU resources
● Reduced network traffic
● CPU parallelism
Database Node
CPURAM
Database Storage
Network
Lower=
Client-side processing Server-side processing
14. Point query performance (find unique row)
14
● Local I/O bandwidth
● Local CPU resources
● Reduced network traffic
● CPU parallelism
● 10,000 index lookups!
● 1 billion rows
Database Node
CPURAM
Database Storage
Network
Lower=
Client-side processing Server-side processing Server-side processing
with index acceleration
16. zlog: implementation of CORFU on Ceph
1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
● Extend the benefits of software-defined storage to log abstraction
○ Transparently select storage media and physical design
○ Take advantage of performance upgrades and new features
○ Offload critical components such as replication and erasure-coding
LevelDB
RocksD
B
WiredTiger
librados
osd osd osd
CORFU protocol
enforced by
custom,
transactional
storage interface.
Balakrishnan et al., “CORFU: A Shared Log Design for Flash Clusters”, NSDI, `12
19. The state of programmable storage is messy
● Large design space
○ High cost of searching this space
● Costs are difficult to predict
○ Simple upgrade and change the calculus!
● Much harder than what we have presented
○ > 500 tunables/settings in Ceph
■ Not counting dependencies
○ Runs on a wide-variety of hardware
● No hope of migrating to a new system
○ There are no standards!
19
More programmability work:
“Data Center Scale Programmable Storage” with Dirk Grunwald (CU Boulder),
NSF #1705021
20. DeclStore: Layering is for
the Faint of Heart
Noah Watkins, Michael Sevilla, Ivo Jimenez, Kathryn
Dahlgren, Peter Alvaro, Shel Finkelstein, Carlos Maltzahn
[HotStorage, July 2017]
21. Declarative storage
21
● Automate parts of this process
○ Searching the design space
○ Generating implementations
Query optimization & plan generation
Cost model
● Express interfaces declaratively
○ Eliminate need for storage system expertise
○ High-level abstractions across services / systems
● Prototyping with the language
○ Formal underpinning, demonstrated across domains
○ Can express all of the CORFU semantics
“Declarative Programmable Storage”,
with Peter Alvaro, NSF #1764102
- 2018
22. IRIS-HEP
● $25 million NSF-funded 5-year project with Princeton
● Institute for Research and Innovation (IRIS)
● High Energy Physics (HEP)
● Exascale storage and analysis challenges