Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Uses with Distributed Asynchronous Object Storage

409 views

Published on

Learn about the architecture and features of Distributed Asynchronous Object Storage (DAOS). This open source object store is based on the Persistent Memory Development Kit (PMDK) for massively distributed non-volatile memory applications.

Published in: Technology
  • DOWNLOAD FULL MOVIE, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... ,DOWNLOAD FULL. MOVIE 4K,FHD,HD,480P here { https://tinyurl.com/yybdfxwh }
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Big Data Uses with Distributed Asynchronous Object Storage

  1. 1. Di Wang Extreme Storage Architecture & Development (ESAD), Intel
  2. 2. SPDK, PMDK & Vtune™ Summit 2 Agenda • DAOS (Distributed Asynchronous Object Storage) Overview • DAOS Architecture & features • DAOS Storage Model • DAOS with PMDK & SPDK • Current Performance & Resource
  3. 3. SPDK, PMDK & Vtune™ Summit 3 Storage revolution 90 25 20 15 10 5 0 NAND SSD (4kB Read) Intel® Optane SSD (4kB Read) Legend NVM Media Read PCIe & NVMe protocol Software (File System, OS, Driver) LatencyfromApp(uS) Intel® Optane NVDIMMs (64B Read)
  4. 4. SPDK, PMDK & Vtune™ Summit 4 DAOS overview DAOS Storage Engine Open Source Apache 2.0 License HDD POSIX I/O 3rd Party Applications Rich Data Models Storage Platform Storage Media Workflow HDF5 SQL … Intel® QLC 3D Nand SSD
  5. 5. SPDK, PMDK & Vtune™ Summit 5 Lightweight I/O Mercury userspace function shipping § MPI equivalent communications latency § Built over libfabric Applications link directly with DAOS lib § Direct call, no context switch § Small memory footprint § No locking, caching or data copy Userspace DAOS server § Mmap non-volatile memory via PMDK § NVMe access through SPDK/Blobstore AI/Analytics/Simulation Workflow DAOS library Mercury/Libfabric NVMe SSDs Bulk transfers SPDK PMDK RPC HDF5 SCM File (No)SQL… DAOS Service
  6. 6. SPDK, PMDK & Vtune™ Summit 6 Storage Model DAOS provides a rich storage API § New scalable storage model suitable for both structured & unstructured data – key-value stores, multi-dimensional arrays, columnar databases, … – Accelerate data analytic/AI frameworks § Non-blocking data & metadata operations § Ad-hoc concurrency control mechanism Pool § Reservation of distributed storage § Predictable/extendable performance/capacity Container § Aggregate related datasets into manageable entity § Unit of snapshot/transaction Object § Key-array store with own distribution/resilience schema § Multi-level key for fine-grain control over colocation of related data Record § Arbitrary binary blob from single byte to several Mbytes Storage Pool Container Object Record
  7. 7. SPDK, PMDK & Vtune™ Summit 7 Fine-grained I/O Mix of storage technologies § Storage Class Memory – DAOS metadata & application metadata – Byte-granular application data § NVMe SSD (*NAND) – Cheaper storage for bulk data (e.g. checkpoints) – Multi-KB I/Os are logged & inserted into persistent index § Non-destructive write & consistent read § No alignment constraints § No read-modify-write v1 v2 v3 read@v3 Application Buffer Server-side Index Bulk descriptor segments
  8. 8. SPDK, PMDK & Vtune™ Summit 8 DATA Management Data Security & Reduction § Online real-time data encryption & compression § Hardware acceleration Data Distribution § Algorithmic placement Data Protection § Declustered replication & erasure code § Fault-domain aware placement § Self-healing § End-to-end data integrity Hash (object.Dkey) Hash (object.Dkey) Fault domain separation
  9. 9. SPDK, PMDK & Vtune™ Summit 9 Pool Storage on DAOS Server DAOS Service Argobots Xstream PMDK pmemobj SPDK Blob SCM NVMe SSD PMDK pmemobj PMDK pmemobj PMDK pmemobj PMDK pmemobj SPDK Blob SPDK Blob SPDK Blob SPDK Blob NVMe block allocation Info PMDK pmemobj SPDK Blob
  10. 10. SPDK, PMDK & Vtune™ Summit 10 DAOS I/O over PMDK/SPDK SCM NVMe DAOS Xstream § Reserve new buffer § Either reserve by pmemobj_reserve § Or reserve in NVME SSD
  11. 11. SPDK, PMDK & Vtune™ Summit 11 DAOS I/O over PMDK/SPDK 11 SCM NVMe DAOS Xstream § Reserve new buffer § Either reserve by pmemobj_reserve § Or reserve in NVME SSD § Start RDMA transfer to newly allocated buffer § Either transfer to PMEM § Or transfer to DMA buffer then to NVME SSD § Start pmemobj transaction
  12. 12. SPDK, PMDK & Vtune™ Summit 12 DAOS I/O over PMDK/SPDK SCM NVMe DAOS Xstream § Reserve new buffer § Either reserve by pmemobj_reserve § Or reserve in NVME SSD § Start RDMA transfer to newly allocated buffer § Either transfer to PMEM § Or transfer to DMA buffer then to NVME SSD § Start pmemobj transaction § Modify index to insert new extent
  13. 13. SPDK, PMDK & Vtune™ Summit 13 DAOS I/O over PMDK/SPDK 13 SCM NVMe DAOS Xstream § Reserve new buffer § Either reserve by pmemobj_reserve § Or reserve in NVME SSD § Start RDMA transfer to newly allocated buffer § Either transfer to PMEM § Or transfer to DMA buffer then to NVME SSD § Start pmemobj transaction § Modify index to insert new extent § Publish the reserve the space. § Either pmemobj_tx_publish() for SCM. § Or publish the space for NVMe SSD. § Commit pmemobj transaction and reply to client
  14. 14. SPDK, PMDK & Vtune™ Summit 14 DAOS Performance 34996 188782 282017 407431 469666 472509 502516 0 200000 400000 600000 800000 1000000 1200000 1 8 16 32 64 128 256 IOPS Number of Clients IOR Write - 1024 I/O size 62392 326432 434839 829526 875873 773290 1019720 0 200000 400000 600000 800000 1000000 1200000 1 8 16 32 64 128 256 IOPS Number of Clients IOR Read - 1024B I/O size • IOR runs on remote clients sending the I/O requests to the single DAOS server over the fabric • Intel Omni-Path Host Adapter 100HFA016LS • Using the DAOS MPI-IO driver with the full DAOS stack (client, network, server) • Cascade Lake CPUs, 6 Dimms 512G AEP NMA1XBD512GQSE
  15. 15. SPDK, PMDK & Vtune™ Summit 15 DAOS Community Roadmap All information provided in this roadmap is subject to change without notice. 1Q19 2Q19 3Q19 4Q19 1Q20 2Q20 3Q20 4Q20 1Q21 2Q21 3Q21 4Q21 1Q22 2Q22 3Q22 Pre-1.0 releases & RCs 1.0 1.2 1.4 2.0 2.2 2.4 DAOS: - Replication with self-healing - Persistent Memory support - NVMe SSD support - Self monitoring & bootstrap - Initial control plane - python/golang API bindings I/O Middleware: - MPI-IO driver - HDF5 DAOS Connector (proto) - POSIX I/O (proto) DAOS: - Per-pool ACL - Lustre integration I/O Middleware: - HDF5 DAOS Connector - POSIX I/O support - Spark DAOS: - End-to-end data integrity - Per-container ACL - SmartNICs & accelerators - Improved control plane DAOS: - Online server addition - Advanced control plane I/O Middleware: - POSIX data mover - Async HDF5 operations over DAOS DAOS: - Erasure code - Telemetry & per-job statistics - Multi OFI provider support I/O Middleware: - Advanced POSIX I/O support - Advanced data mover Partner engagement & PoCs DAOS: - Progressive layout / GIGA+ - Placement optimizations - Checksum scrubbing I/O Middleware: - Apache Arrow (not POR) DAOS: - Catastrophic recovery tools
  16. 16. SPDK, PMDK & Vtune™ Summit 16 Resource Source code on GitHub https://github.com/daos-stack/daos Community mailing list on Groups.io daos@daos.groups.io or https://daos.groups.io/g/daos Wiki http://daos.io or https://wiki.hpdd.intel.com Bug tracker https://jira.hpdd.intel.com

×