Ceph at Salesforce
Sameer Tiwari - Principal Architect, Storage Cloud
stiwari@salesforce.com
@techsameer
https://www.linkedin.com/in/sameer-tiwari-1961311/
Data Types
Structured Customer Data: Mostly transactional data on RDBMS
Unstructured Customer Data: Immutable blobs on home grown distributed storage system
SAN usage across multiple use cases
Backups: Both commercial solutions and internal systems
Caching : Immutable structured blobs
Events : On HDFS (plus other systems along the way)
Logs : On HDFS (plus other systems along the way)
Storage Technologies Used
File Storage
NOSQL
HBase
HDFS
SAN
SDS (Software
Designed Store)
on scale-out
commodity
hardware
Uses for Ceph
Block Store
Backend for RDBMs (Maybe with BK for journal)
Various size (to >> local disk) mountable disk on the cloud
Re-mountable storage for VMs
Replace some SAN scenarios
Blob Store
General purpose blob store
Sharing of data across users
Examples : VM/Container images, Core Dumps, Large file transfer, Customer Data, IoT
Ceph Rados
RadosGW
Private
Cloud
SF
Services
Hardware Storage SKU farm
10+ GigE Network
SF Blob
Service
Cloud Applications
RDBMS
SANs
Org specific
operations
SF Block
Service
Salesforce Infrastructure and Ceph
Current Status
Experimenting with multiple small test clusters (~100 nodes)
Machines are generally with lots of RAM, few SSDs and a bunch of HDDs
Currently on a single 10G Network, moving to much bigger
Machines are spread across lots of racks, but in a single room (very little over provisioning)
Testing only rbd
Simple crushmap mods for creating SSD only pools, and availability zones
Very high magnitude of scale: multiple clusters, across multiple DC, each multi-tenant
Operationalize for a very different and challenging requirement
Performance numbers (using fio to provide test load)
SSD only pool with 12 machines, 2X12 CPU, 128G, 2X480G SSD
Random R / W for 8K blocks, 70/30 ratio
Performance numbers (using fio to provide test load)
SSD only pool with 12 machines, 2X12 CPU, 128G, 2X480G SSD
Sequential Write for 128K blocks
Performance numbers (using fio to provide test load)
SSD only pool with 12 machines, 2X12 CPU, 128G, 2X480G SSD
Random Read for 8K blocks
Experiments
Pre-work: Hookup metrics, logs and alerts to Salesforce Infrastructure
Fio perf on mounted client side block device with XFS
Testing lots and lots of failure scenarios (think chaos monkey)
More focus on slow devices (network, host, disk)
Crushmap settings for heterogenous environments (will build a tool to generate this
automatically)
Set up a CI/CD pipeline
Running Ceph in a dockerized environment with Kubernetes
Ability to patch a deployed cluster (OS, Docker, Ceph)
Going over the code, line by line
Future
Read from any replica (inconsistent reads should help in tail latency)
Can reads search the journal (should help in tail latency)
Need pluggability in RGW, there is a pre_exec() in rgw_op.cc OR
Extend the RGWHandler class, or use the pre_exec() call in RGWOp class
Challenges of Storage Services at Salesforce
Scale brings problems all its own - more hardware to fail or act funny, regular cap add, hw
changes
Multiple dimensions of multi-tenancy
External Customers (isolation, auth/encryption, security, perf, availability, durability,
etc.)
Service supporting many many use cases and internal platforms
Running large # of clusters in large # of data centers
Questions?
Sameer Tiwari - Principal Architect, Storage Cloud
● stiwari@salesforce.com
● @techsameer
● https://www.linkedin.com/in/sameer-tiwari-1961311/

Ceph Day San Jose - Ceph at Salesforce

  • 1.
    Ceph at Salesforce SameerTiwari - Principal Architect, Storage Cloud stiwari@salesforce.com @techsameer https://www.linkedin.com/in/sameer-tiwari-1961311/
  • 2.
    Data Types Structured CustomerData: Mostly transactional data on RDBMS Unstructured Customer Data: Immutable blobs on home grown distributed storage system SAN usage across multiple use cases Backups: Both commercial solutions and internal systems Caching : Immutable structured blobs Events : On HDFS (plus other systems along the way) Logs : On HDFS (plus other systems along the way)
  • 3.
    Storage Technologies Used FileStorage NOSQL HBase HDFS SAN SDS (Software Designed Store) on scale-out commodity hardware
  • 4.
    Uses for Ceph BlockStore Backend for RDBMs (Maybe with BK for journal) Various size (to >> local disk) mountable disk on the cloud Re-mountable storage for VMs Replace some SAN scenarios Blob Store General purpose blob store Sharing of data across users Examples : VM/Container images, Core Dumps, Large file transfer, Customer Data, IoT
  • 5.
    Ceph Rados RadosGW Private Cloud SF Services Hardware StorageSKU farm 10+ GigE Network SF Blob Service Cloud Applications RDBMS SANs Org specific operations SF Block Service Salesforce Infrastructure and Ceph
  • 6.
    Current Status Experimenting withmultiple small test clusters (~100 nodes) Machines are generally with lots of RAM, few SSDs and a bunch of HDDs Currently on a single 10G Network, moving to much bigger Machines are spread across lots of racks, but in a single room (very little over provisioning) Testing only rbd Simple crushmap mods for creating SSD only pools, and availability zones Very high magnitude of scale: multiple clusters, across multiple DC, each multi-tenant Operationalize for a very different and challenging requirement
  • 7.
    Performance numbers (usingfio to provide test load) SSD only pool with 12 machines, 2X12 CPU, 128G, 2X480G SSD Random R / W for 8K blocks, 70/30 ratio
  • 8.
    Performance numbers (usingfio to provide test load) SSD only pool with 12 machines, 2X12 CPU, 128G, 2X480G SSD Sequential Write for 128K blocks
  • 9.
    Performance numbers (usingfio to provide test load) SSD only pool with 12 machines, 2X12 CPU, 128G, 2X480G SSD Random Read for 8K blocks
  • 10.
    Experiments Pre-work: Hookup metrics,logs and alerts to Salesforce Infrastructure Fio perf on mounted client side block device with XFS Testing lots and lots of failure scenarios (think chaos monkey) More focus on slow devices (network, host, disk) Crushmap settings for heterogenous environments (will build a tool to generate this automatically) Set up a CI/CD pipeline Running Ceph in a dockerized environment with Kubernetes Ability to patch a deployed cluster (OS, Docker, Ceph) Going over the code, line by line
  • 11.
    Future Read from anyreplica (inconsistent reads should help in tail latency) Can reads search the journal (should help in tail latency) Need pluggability in RGW, there is a pre_exec() in rgw_op.cc OR Extend the RGWHandler class, or use the pre_exec() call in RGWOp class
  • 12.
    Challenges of StorageServices at Salesforce Scale brings problems all its own - more hardware to fail or act funny, regular cap add, hw changes Multiple dimensions of multi-tenancy External Customers (isolation, auth/encryption, security, perf, availability, durability, etc.) Service supporting many many use cases and internal platforms Running large # of clusters in large # of data centers
  • 14.
    Questions? Sameer Tiwari -Principal Architect, Storage Cloud ● stiwari@salesforce.com ● @techsameer ● https://www.linkedin.com/in/sameer-tiwari-1961311/

Editor's Notes

  • #4 RDBMS + SAN and Backups Replicated Files - depends on RAID. 4 copies per DR zone (8-12 more for backup) SSDs may or may not be fully utilized Primary unit of storage Files/Directories on top of Unix FileSystems SSD SAN cost is very high Little or No concept of storage tiering, whereas most data is cold. SSDs not/under utilized
  • #7 Have some different and challenging requirements than many existing Ceph implementations, so building framework to operationalize large #’s of multitenant clusters with very low human effort
  • #15 Q & A with email, contact info