Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Decoupling Compute and Storage for Data Workloads

65 views

Published on

This was presented by Carlos Quieroz, Head of Data Platform at Development Bank of Singapore, at the Data Transformation in Financial Services meetup in Singapore jointly hosted by Accenture, Talend, BigDataSG Hadoop, and Alluxio.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Decoupling Compute and Storage for Data Workloads

  1. 1. Decoupling compute and storage for data workloads Carlos Queiroz
  2. 2. Data processing workloads at DBS Hadoop introduced in 2015 Business and Regulatory Reporting DataWareHouse replacement? Analytics datanode JVM DataNode1 datanode JVM DataNode2 datanode JVM DataNodeX … namenode JVM NameNode namenode JVM NameNode ETL Batch Bare-Metal Data Locality HDFS on local disks Enterprise transactions Logs mainframe H D F S ETL Processing Data Science H D F S User ETL ETL
  3. 3. Why decoupling?
  4. 4. Current model • Hard to scale • Scale Compute AND Storage • It is not flexible • Costs Bare-Metal Data Locality HDFS on local disks
  5. 5. Also in 2015 EMC and Adobe bringing HDaaS https://www.brighttalk.com/webcast/1744/156173
  6. 6. Decoupling compute and storage Bare-Metal Data Locality HDFS on local disks Containers and VMs Separate Compute and Storage Shared Storage Data as a Service Agility and cost savings Faster time to foresights Traditional Assumptions A New Approach Benefits and Value https://www.bluedata.com/blog/2015/12/separating-hadoop-compute-and-storage/Adapted from
  7. 7. Fast Forward to 2017 Re-engineering the data platform StorageCompute DataIngestion Decisionsupport Object store In-memory Filesystem Compute engine I Compute Engine II Compute Engine III Compute Engine IV …
  8. 8. Fast Forward to 2017 Storage Object store In-memory Filesystem Compute Compute engine I Compute Engine II Compute Compute engine I Compute Compute engine I Compute Engine II Compute Engine X Multi-tenancy Different SLAs Different Engines Different Cluster sizes
  9. 9. Implementing it
  10. 10. In-memory filesystem • Apps only talk to Alluxio • Simple Add/Remove • No App Changes • Highest performance in Memory
  11. 11. Alluxio - Server side API translation HDFS Interface S3A Interface S3 compatible S3 compatible software
  12. 12. Reference implementation StorageCompute
  13. 13. Current implementation status • Development environment with 50 VMs • Running benchmarks for performance evaluation • Cloudera 5.13.x • S3 compatible object store vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB vCPU: 12 RAM: 128 Disk: 400GB
  14. 14. Questions???

×