Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Flexible and Fast Storage for Deep Learning with Alluxio

581 views

Published on

GTC 2018
Mike Wendt (Nvidia), Yupeng Fu (Alluxio)

Published in: Technology
  • Be the first to comment

Flexible and Fast Storage for Deep Learning with Alluxio

  1. 1. Mike Wendt @mike_wendt FLEXIBLE AND FAST STORAGE FOR DEEP LEARNING WITH ALLUXIO Yupeng Fu
  2. 2. 2 ACCELERATE THE DEEP LEARNING STACK GPU-Acceleration by NVIDIA and Fast Storage from Alluxio Apache Arrow +
  3. 3. 3 DATA PROCESSING EVOLUTION Faster Data Access Less Data Movement Store Read HDFS Write HDFS Read HDFS Write HDFS Read Query ETL ML Train Hadoop Processing, Reading from disk
  4. 4. 4 DATA PROCESSING EVOLUTION Faster Data Access Less Data Movement HDFS Read HDFS Write HDFS Read HDFS Write HDFS Read Query ETL ML Train HDFS Read Query ETL ML Train Hadoop Processing, Reading from disk 25-100x Improvement Less code Language flexible Primarily In-Memory Storage bottleneck Spark In-Memory Processing
  5. 5. 5 25-100x Improvement Less code Language flexible Primarily In-Memory DATA PROCESSING EVOLUTION Faster Data Access Less Data Movement HDFS Read HDFS Write HDFS Read HDFS Write HDFS Read Query ETL ML Train HDFS Read Query ETL ML Train HDFS Read GPU Read Query CPU Write GPU Read ETL CPU Write GPU Read ML Train 5-10x Improvement More code Language rigid Substantially on GPU Storage still a bottleneck GPU/Spark In-Memory Processing Hadoop Processing, Reading from disk Spark In-Memory Processing
  6. 6. 6 WE CAN DO BETTER!
  7. 7. 7 ACCELERATE THE DEEP LEARNING STACK GPU-Acceleration by NVIDIA and Fast Storage from Alluxio Apache Arrow +
  8. 8. 8 APP A GPU-ACCELERATED ARCHITECTURE THEN Too much data movement and too many different data formats CPU GPU APP B Read DataH2O.ai Anaconda Gunrock Graphistry BlazingDB MapD Copy & Convert Copy & Convert Copy & Convert Load Data APP A GPU Data APP B GPU Data
  9. 9. 9 GPU-ACCELERATED ARCHITECTURE NOW Single data format and shared access to data on GPU CPU GPU GPU MEM Read DataH2O.ai Anaconda Gunrock Graphistry BlazingDB MapD Load Data Apache Arrow Powered by: GPU Data Frame
  10. 10. 10 GRAPH PROCESSING ANALYTICS GPU DATABASES github.com/gpuopenanalytics @gpuoai Apache Arrow Powered by:
  11. 11. 11 GPU ACCELERATION ACROSS THE ECOSYSTEM Apache Arrow
  12. 12. 12 25-100x Improvement Less code Language flexible Primarily In-Memory DATA PROCESSING EVOLUTION Faster Data Access Less Data Movement HDFS Read HDFS Write HDFS Read HDFS Write HDFS Read Query ETL ML Train HDFS Read Query ETL ML Train HDFS Read GPU Read Query CPU Write GPU Read ETL CPU Write GPU Read ML Train Alluxio Read Query ETL ML Train 5-10x Improvement More code Language rigid Substantially on GPU 25-100x Improvement Same code Language flexible Primarily on GPU Alluxio Fast and Flexible Storage End to End GPU Processing (GoAi) GPU/Spark In-Memory Processing Hadoop Processing, Reading from disk Spark In-Memory Processing
  13. 13. 13 ACCELERATE THE ENTIRE ANALYTICS STACK GPU-Acceleration by NVIDIA and Fast Storage from Alluxio Apache Arrow +
  14. 14. 14 DATA & AI ECOSYSTEM EXPLODES … • Many Compute Frameworks • Many Storage Systems • Most not co-located …
  15. 15. 15 Data & AI Ecosystem Issues • Each app manages multiple data sources • Data source changes require global updates • Storage optimizations requires app change • Poor performance due to lack of locality … …
  16. 16. 16 Data & AI Ecosystem with Alluxio • Apps only talk to Alluxio • Simple Add/Remove • No App Changes • Highest performance in Memory Java File API HDFS Interface Amazon S3 Interface REST Web Service HDFS Interface Amazon S3 Interface Swift Interface NFS Interface … …
  17. 17. 17 Storage Challenges for DL& ML 2 Data Freshness • Cross-network movement is slow • Copies create lag • Data quality suffers with copies 4 Security & Governance • Data security & governance is increasingly complex 1 Speed & Complexity • Integration and interoperability issues (on prem, hybrid, cloud) • Many departments & groups 3 Cost • Cloud storage is cheap and reliable, but slow • Data duplication 17 Heavy integrations create painful organizational drag
  18. 18. 18 Alluxio Design Principles 2 Data Sharing • Don’t own the data • Multiple apps sharing common data • Data stored in multiple, hybrid systems 4 Enterprise Class • Distributed architecture • Commodity hardware • Service-oriented • High availability • Security 1 Big Data & Machine Learning • Interoperability with leading projects • Large scale data sets • High IO 3 High Speed Data Access • Remote data • Hot/warm/cold data • Temporary data • Read/write support 18
  19. 19. 19 Alluxio FUSE Deep Learning Frameworks Unified Data Storage Systems ALLUXIO FUSE
  20. 20. 20 Filesystem in Userspace (FUSE) Running file system code in user space
  21. 21. 21 Alluxio Innovation: Unified Namespace Enables effective data management across different Under Stores Uses Mounting with Transparent Naming
  22. 22. 22 Alluxio Innovation: Unified Namespace Create a catalog of available data sources for Data Scientists /finance/customer-transactions/ /finance/vendor-transactions/ /operations/device-logs/ /operations/phone-call-recordings/ /operations/check-images/ /research/us-economic-data/ /research/intl-economic-data/ /marketing/advertising-dataset/ /marketing/marketing-funnel-dataset/ alluxio://
  23. 23. 23 Alluxio Innovation: Intelligent Cache Local performance from remote data using native multi-tier storage RAM SSD HDD Hot Warm Cold
  24. 24. 24 Deep Learning Input Pipeline Deep Learning training involves three stages of utilizing different resources: • Data reads (I/O): e.g. choose and read image files from source. • Data Preprocessing (CPU): e.g. decode image records into images, preprocess, and organize into mini-batches. • Modeling training (GPU): Calculate and update the parameters in the multiple convolutional layers
  25. 25. 25 Alluxio overcomes I/O bottleneck
  26. 26. 26 Alluxio Architecture Alluxio Master Zookeeper Standby Master Alluxio Worker Alluxio Worker Under Store Under Store Alluxio Client Application RAM / SSD / HDD RAM / SSD / HDD Control Path Data Path
  27. 27. 28 Read data in Alluxio, on same node as client Alluxio Worker RAM / SSD / HDD Memory Speed Read of Data Application Alluxio Client Alluxio Master
  28. 28. 29 Read data not in Alluxio + Caching 29 RAM / SSD / HDD Network / Disk Speed Read of Data Application Alluxio Client Alluxio Master Alluxio WorkerUnder Store
  29. 29. 30 Read data in Alluxio, not on same node as client + Caching RAM / SSD / HDD Network Speed Read of Data Application Alluxio Client Alluxio Master Alluxio Worker RAM / SSD / HDD Alluxio Worker
  30. 30. 31 Write data only to Alluxio on same node as client Alluxio Worker RAM / SSD / HDD Memory SpeedWrite of Data Application Alluxio Client Alluxio Master
  31. 31. 32 Write data to Alluxio and Under Store synchronously RAM / SSD / HDD Network / Disk SpeedWrite of Data Application Alluxio Client Alluxio Master Alluxio Worker Under Store
  32. 32. 33 100+ known production deployments AND MORE!
  33. 33. Twitter.com/alluxio Linkedin.com/alluxio Website www.alluxio.com E-mail info@alluxio.com @ Social Media á ™ 34 Thank you! Yupeng Fu yupeng@alluxio.com Github: yupeng9 Twitter.com/alluxio Linkedin.com/alluxio Website www.alluxio.org E-mail info@alluxio.com @ Social Media á ™

×