Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hive spark-s3acommitter-hbase-nfs

2,057 views

Published on

Cloud-era Hadoop and Spark with S3 and NFS on all-flash storage.

Published in: Data & Analytics
  • Be the first to comment

Hive spark-s3acommitter-hbase-nfs

  1. 1. Hive/Spark/HBase on S3 & NFS – No More HDFS Operation ( / Yifeng Jiang) @uprush March 14th, 2019
  2. 2. Yifeng Jiang Yifeng Jiang / / ( ) • APJ Solution Architect, Data Science @ Pure Storage • Big data, machine learning, cloud, PaaS, web systems Prior to Pure • Hadooper since 2009 • HBook author • Software engineer, PaaS, Cloud
  3. 3. Agenda • Separate compute and storage • Hive & Spark on S3 with S3A Committers • HBase on NFS • Demo, benchmark and case study
  4. 4. Hadoop Common Pain Points • Hardware complexity: racks of server, cables, power • Availability SLA’s / HDFS operation • Unbalanced CPU and storage demands • Software complexity: 20+ components • Performance Tuning
  5. 5. Separate Compute & Storage • Hardware complexity: racks of server, cables, power • Availability SLA’s / HDFS operation • Unbalanced CPU and storage demands • Software complexity: 20+ components • Performance Tuning Virtual machine S3/NFS Scale independently
  6. 6. Modernizing Data Analytics DATA INGEST Search NFS ETL S3 QueryStore Deep Learning Storage Backend
  7. 7. Cluster Topology node1 NAS/Object Storage NFSS3 SAN Block Storage iSCSI/FC node2 nodeN… Hadoop/Spark Cluster On Virtual Machine • OS • Hadoop binary, HDFS • SAN volume • HTTP only • Data lake, file • Spark, Hive • Same mount point on nodes • HBase S3A NFSXFS/Ext4
  8. 8. Hive/Spark on S3
  9. 9. Hadoop S3a Library Hadoop DFS protocol • The communication protocol between NN, DN and client • Default implementation: HDFS • Other implementations: S3a, Azure, local FS, etc. Hadoop S3A library • Hadoop DFS protocol implementation for S3 compatible storage like Amazon S3, Pure Storage FlashBlade. • Enable the Hadoop ecosystem (Spark, Hive, MR, etc.) to store and process data in S3 object storage. • Several years in production, heavily used on cloud.
  10. 10. Hadoop S3A Demo
  11. 11. THE ALL-FLASH DATA HUB FOR MODERN ANALYTICS 10+ PBs / RACK DENSITY FILE & OBJECT CONVERGED 75 GB/S BW 7.5 M+ NFS OPS BIG + FAST JUST ADD A BLADE! SIMPLE ELASTIC SCALE
  12. 12. Hadoop Ecosystem and S3 How it works? • Hadoop ecosystem (Spark, Hive, MR, etc.) uses HDFS client internally. • Spark executor -> HDFS client -> storage • Hive on Tez container -> HDFS client -> storage • HDFS client speaks Hadoop DFS protocol. • Client automatically choose proper implementation to use base on schema. • /user/joe/data -> HDFS • file:///user/joe/data -> local FS (including NFS) • s3a://user/joe/data -> S3A • Exception: HBase (details covered later)
  13. 13. Spark on S3 Spark submit • Changes: use s3a:// as input/output. • Temporary data I/O: YARN -> HDFS • User data I/O: Spark executors -> S3A -> S3 YARN RM spark- submit YARN Container HDFS S3 temp data val flatJSON = sc.textFile("s3a://deephub/tweets/")
  14. 14. Hadoop on S3 Challenges Consistency model • Amazon S3: eventual consistency, use S3Guard • S3 compatible storage (e.g. FlashBlade S3) supports strong consistency Slow “rename” • “rename” is critical in HDFS to support atomic commits, like Linux “mv” • S3 does not support “rename” natively. • S3A simulates “rename” as LIST – COPY - DELETE
  15. 15. Slow S3 “Rename” Image source: https://stackoverflow.com/questions/42822483/extremely-slow-s3-write-times-from-emr-spark/42835927
  16. 16. Hadoop on S3 Updates Make S3A cloud native • Hundreds of JIRAs • Robustness, scale and performance • S3Guard • Zero-rename S3A committers. Use Hadoop 3.1~
  17. 17. S3A Committers Originally S3A use FileOutputCommitter, which relays on “rename” Zero-rename, cloud-native S3A Committers • Staging committer: directory & partitioned • Magic committer
  18. 18. S3A Committers Staging Committer • Does not require strong consistency. • Proven to work at Netflix. • Requires large local FS space. Magic committer • Require strong consistency. S3Guard on AWS, FlashBlade S3, etc. • Faster. Use less local local FS space. • Less stable/tested than staging committer. Common Key Points • Fast, no “rename” • Both leverage S3 transactional multi-parts upload
  19. 19. S3A Committer Demo
  20. 20. S3A Committer Benchmark • 1TB MR random text writes • FlashBlade S3 • 15 blades • Plenty compute nodes 0 200 400 600 800 1000 1200 file directory magic time(s) Output Committer 1TB Genwords Bench
  21. 21. Teragen Benchmark Magic committer faster File committer slow Staging committer fast S3 ReadLocal FS ReadLess local FS Read
  22. 22. HBase on NFS The peace of volcano
  23. 23. HBase & HDFS What HBase want from HDFS? • HFile & WAL durability • Scale & performance • Mostly latency, but also throughput What HBase does NOT want from HDFS? • Noisy neighbors • Co-locating compute: YARN, Spark • Co-locating data: Hive data warehouse • Complexity: operation & upgrade
  24. 24. HBase on NFS How it works? • Use NFS as HBase root and staging directory. • Same mount NFS point on all RegionServer nodes • Point HBase to store data in that mount point • Leverage HDFS local FS implementation (file:///mnt/hbase) No change on application. • Clients only see HBase tables. HMasterclient Region Server NFS Table API
  25. 25. HFile Durability • HDFS uses 3x replication to protect HFile. • HFile replication is not necessary in enterprise NFS • Erasure coding or RAID like data protection within/across storage array • Amazon EFS stores data within and across multiple AZs • FlashBlade supports N+2 data durability and high availability
  26. 26. HBase WAL Durability • HDFS uses “hflush/hsync” API to ensure WAL is safely flushed to multiple data nodes before acknowledging clients. • Not necessary in enterprise NFS • FlashBlade acknowledges writes after data is persisted in NVRAM on 3 blades.
  27. 27. NFS Performance for HBase • Depend on NFS implementation. • NFS is generally good for random access. • Also check throughput. • Flash storage is ideal for HBase. • All-flash scale-out NFS such as Pure Storage FlashBlade
  28. 28. HBase PE on FlashBlade NFS random writes 1RS, 1M rows/client, 10 clients 7 blades random reads 1RS, 100K rows/client, 20 clients 7 blades Memstore flush storm? Block cache affects result Latency seen by storage, stable
  29. 29. HBase PE on Amazon EFS random writes 1RS, 1M rows/client, 10 clients 1024MB/s provisioned throughput EFS random reads 1RS, 100K rows/client, 20 clients 1024MB/s provisioned throughput EFS Region too busy, memstore flush is slow
  30. 30. Key Takeaways • Storage options for cloud-era Hadoop/Spark • Hive & Spark on S3 with cloud-native S3A Committers • HBase on enterprise NFS • Available in cloud and on premise (Pure Storage FlashBlade) • Additional benefits: always-on compression, encryption, etc. • Proven to work • Simple, reliable, performant • No more HDFS operation • Virtualize your Hadoop/Spark cluster

×