Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What's new in Hadoop Common and HDFS

1,735 views

Published on

What's new in Hadoop Common and HDFS

Published in: Technology
  • Be the first to comment

What's new in Hadoop Common and HDFS

  1. 1. Copyright©2016 NTT corp. All Rights Reserved. What’s new in Hadoop Common and HDFS @Hadoop Summit Tokyo 2016 Tsuyoshi Ozawa NTT Software Innovation Center 2016/10/26
  2. 2. 2Copyright©2016 NTT corp. All Rights Reserved. • Tsuyoshi Ozawa • Research & Engineer @ NTT Twitter: @oza_x86_64 • Apache Hadoop Committer and PMC • Introduction to Hadoop 2nd Edition(Japanese)” Chapter 22(YARN) • Online article: gihyo.jp “Why and How does Hadoop work?” About me
  3. 3. 3Copyright©2016 NTT corp. All Rights Reserved. • What’s new in Hadoop 3 Common and HDFS? • Build • Compiling source code with JDK 8 • Common • Better Library Management • Client-Side Class path Isolation • Dependency Upgrade • Support for Azure Data Lake Storage • Shell script rewrite • metrics2 sink plugin for Apache Kafka HADOOP-10949 • HDFS • Erasure Coding Phase 1 HADOOP-11264 • MR, YARN -> Junping will talk! Agenda
  4. 4. Copyright©2016 NTT corp. All Rights Reserved. Build
  5. 5. 5Copyright©2016 NTT corp. All Rights Reserved. • We upgraded minimum JDK to JDK8 • HADOOP-11858 • Oracle JDK 7 is EoL at April 2015!! • Moving forward to use new features of JDK8 • Hadoop 2.6.x • JDK 6, 7, 8 or later • Hadoop 2.7.x/2.8.x/2.9.x • JDK 7, 8 or later • Hadoop 3.0.x • JDK 8 or later Apache Hadoop 3.0.0 run on JDK 8 or later
  6. 6. Copyright©2016 NTT corp. All Rights Reserved. Common
  7. 7. 7Copyright©2016 NTT corp. All Rights Reserved. • Jersey: 1.9 to 1.19 • the root element whose content is empty collection is changed from null to empty object({}). • grizzly-http-servlet: 2.1.2 to 2.2.21 • Guice: 3.0 to 4.0 • cglib: 2.2 to 3.2.0 • asm: 3.2 to 5.0.4 Dependency Upgrade
  8. 8. 8Copyright©2016 NTT corp. All Rights Reserved. Client-side classpath isolation Problem • Application code’s can conflict with Hadoop’s one Solution • Separating Server-side jar and Client-side jar • Like hbase-client, dependencies are shared HADOOP-11656/HADOOP-13070 Hadoop Client Server Older commons User code newer commons Single Jar File Conflicts!!! Hadoop -client shaded User code newer commons
  9. 9. 9Copyright©2016 NTT corp. All Rights Reserved. • FileSystem API supports various storages • HDFS • Amazon S3 • Azure Blob Storage • OpenStack Swift • 3.0.0 supports Azure Data Lake Storage officially Support for Azure Data Lake Storage
  10. 10. 10Copyright©2016 NTT corp. All Rights Reserved. • CLI are renewed! • To fix bugs (e.g. HADOOP_CONF_DIR is honored sometimes) • To introduce new features E.g. • To launch daemons, Use {hadoop,yarn,hdfs} --daemon command instead of {hadoop,yarn,hdfs}-daemons.sh • To print various environment variables, java options, classpath, etc “{hadoop,yarn,hdfs} --debug” option is supported • Please check documents • https://hadoop.apache.org/docs/current/hadoop-project- dist/hadoop-common/CommandsManual.html • https://issues.apache.org/jira/browse/HADOOP-9902 Shell script rewrite
  11. 11. 11Copyright©2016 NTT corp. All Rights Reserved. • Metrics System 2 is collector of daemon metrics • Hadoop’s daemon log can be dumped into Apache Kafka metrics2 sink plugin for Apache Kafka Metrics System2 DataNode Metrics NameNode Metrics NodeManager Metrics Apache Kafka Sink (New!)
  12. 12. Copyright©2016 NTT corp. All Rights Reserved. HDFS Namenode Multi Standby
  13. 13. 13Copyright©2016 NTT corp. All Rights Reserved. • Before: 1 Active – 1 Standby NameNode • Need to recover immediately after Active NN fails • After: 1 Active - N standby NameNode can be chosen • Be able to choose trade off machine costs vs operation costs NameNode Multi-Standby NN Active NN Standby NN Active NN Standby NN Standby NN Standby NN Standby
  14. 14. Copyright©2016 NTT corp. All Rights Reserved. HDFS Erasure Coding
  15. 15. 15Copyright©2016 NTT corp. All Rights Reserved. • Background • HDFS uses Chain Replication for higher throughput and strong consistency • A case when replication factor is 3 • Pros • Simplicity • Network throughput can be suppressed between client and replicas • Cons • High latency • 33% of storage efficiency Replication –traditional HDFS way- DATA1 DATA1 DATA1Client ACK Data
  16. 16. 16Copyright©2016 NTT corp. All Rights Reserved. • Erasure Coding is another way to save storage with fault tolerance • Used in RAID 5/6 • Using “parity” instead of “copy” to recover • Reed-Solomon coding is used • If data is lost, recover is done with inverse matrix Erasure Coding 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 𝑋02 𝑋01 𝑋02 𝑋03 𝑋12 𝑋11 𝑋12 𝑋13 𝒅 𝟏 𝒅 𝟐 𝒅 𝟑 𝒅 𝟒 × = 𝒅 𝟏 𝒅 𝟐 𝒅 𝟑 𝒅 𝟒 𝑐 𝟎 𝑐 𝟏 Parity Bits Data Bits FAST ’09: 7th, A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries For Storage Storing these values instead of only storing data! 4 bits data – 2 bits parity read Solomon
  17. 17. 17Copyright©2016 NTT corp. All Rights Reserved. • Erasure coding is flexible: tuning of data bits and parity bits can be done • 6 data-bits, 3 parity-bits • 3 replication vs (6, 3)-read Solomon Effect of Erasure Coding 3-replication (6, 3) Reed-Solomon Maximum fault Tolerance 2 3 Disk usage (N byte of data) 3N 1.5N HDFS Erasure Coding Design Document: https://issues.apache.org/jira/secure/attachment/12697210/HDFSEra sureCodingDesign-20150206.pdf
  18. 18. 18Copyright©2016 NTT corp. All Rights Reserved. • 2 approaches • Striping : Splitting blocks into smaller block • Pros Effective for small files • Cons Less Data Locality to read block • Contiguous • Creating parities with blocks • Pros Better Locality • Cons Smaller files cannot be handled Possible EC design in HDFS 1MB 1MB 1MB 1MB 1MB 1MB 1MB 1MB 1MB ParitiesData 64MB 64MB 64MB 64MB 64MB 64MB 64MB 64MB 64MB
  19. 19. 19Copyright©2016 NTT corp. All Rights Reserved. • According to fsimage Analysis‘ report, files over 90% are smaller than HDFS block size, 64MB • Figure 3 source: fsimage Analysis https://issues.apache.org/jira/secure/attachment/12690129/fsimage- analysis-20150105.pdf Which is better, striping or contiguous? 1 group: 6 blocks Cluster 3Cluster 1
  20. 20. 20Copyright©2016 NTT corp. All Rights Reserved. • Starting from Striping to deal with smaller files • Hadoop 3.0.0 implemented Phase 1.1 and Phase 1.2 Apache Hadoop’s decision HDFS Erasure Coding Design Document: https://issues.apache.org/jira/secure/attachment/12697210/HDFSEra sureCodingDesign-20150206.pdf
  21. 21. 21Copyright©2016 NTT corp. All Rights Reserved. • What’s changed? • How to preserve a data in DataNode • How to preserve a metadata in NameNode • Client Write path • Client Read path Erasure Coding in HDFS (ver. 2016) HDFS Erasure Coding Design Document: https://issues.apache.org/jira/secure/attachment/12697210/HDFSEra sureCodingDesign-20150206.pdf
  22. 22. 22Copyright©2016 NTT corp. All Rights Reserved. • Block size data size: 1MB (not 64MB) • Calculate Parity bits at client side, at Write Time • Write in parallel (not chain replication) How to preserve data in HDFS (write path) HDFS Erasure Coding Design Document: https://issues.apache.org/jira/secure/attachment/12697210/HDFSEra sureCodingDesign-20150206.pdf
  23. 23. 23Copyright©2016 NTT corp. All Rights Reserved. • Read 9 small blocks • If no data is lost, never touch parities How to retrieve data - (6, 3) Reed Solomon- DataNode DataNode1MB 1MB Client DataNode1MB 6 data 3 parities … … Read 6 Data
  24. 24. 24Copyright©2016 NTT corp. All Rights Reserved. • Pros • Low latency because of parallel write/read • Good for small-size files • Cons • Require high network bandwidth between client-server Network traffic Workload 3-replication (6, 3) Reed-Solomon Read 1 block 1 LN 1/6 LN + 5/6 RR Write 1LN + 1LR + 1RR 1/6 LN + 1/6 LR + 7/6 RR LN: Local Node LR: Local Rack RR: Remote Rack
  25. 25. 25Copyright©2016 NTT corp. All Rights Reserved. • Write path/Read path are changed! • How much network traffic? • How many small files? • If network traffic is very high, replication seems to be preferred • If there are cold data and most of them are small, EC is good option  Operation Points
  26. 26. 26Copyright©2016 NTT corp. All Rights Reserved. • Build • Upgrade minimum JDK to JDK 8 • Commons • Be careful about Dependency Management of your project if you write hand-coded MapReduce • Shell script rewrite make operation easy • Kafka Metrics2 Sink • New FileSystem backend: Azure Data Lake • HDFS • Multiple Standby NameNode make operation flexible • Erasure Coding • Efficient disk usage than replication • Every know-how will be changed! Summary
  27. 27. 27Copyright©2016 NTT corp. All Rights Reserved. • Kai Zheng slide is good for a reference • http://www.slideshare.net/HadoopSummit/debunking-the-myths- of-hdfs-erasure-coding-performance • HDFS Erasure Coding Design Document • https://issues.apache.org/jira/secure/attachment/12697210/HDF SErasureCodingDesign-20150206.pdf • Fsimage Analysis • https://issues.apache.org/jira/secure/attachment/12690129/fsim age-analysis-20150105.pdf • Hadoop 3.0.0-alpha RELEASE Note • http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-project- dist/hadoop-common/release/3.0.0-alpha1/CHANGES.3.0.0- alpha1.html References
  28. 28. 28Copyright©2016 NTT corp. All Rights Reserved. • Thanks all users, contributors, committers, and PMC of Apache Hadoop! • Especially, Andrew Wang had great effort to release 3.0.0-alpha! • Thanks Kota Tsuyuzaki, a OpenStack Swift developer, for reviewing my EC related slides! Acknowledgement

×