Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Hadoop 3.0
Revolution or evolution?
uweprintz
/whoami
&
/disclaimer
Some Hadoop history
Hadoop 2
HDFS
Redundant, reliable storage
MapReduce
Data processing
YARN
Cluster resource management
H...
Why Hadoop 3.0?
• Deprecated APIs can only be removed in major
release
• Wire-compatibility will be broken
• Change of def...
What is Hadoop 3.0?
20142010 2011 201320122009 2015
2.2.02.0.0-alpha
branch-1
(branch-0.20)
1.0.0 1.1.0 1.2.1 (Stable)0.20...
Hadoop 3.0 in a nutshell
• HDFS
• Erasure codes
• Low-Level Performance enhancements with Intel ISA-L
• 2+ NameNodes
• Int...
HDFS
HDFS - Current implementation
• 3 replicas by default
• Tolerate maximum of 2 failures
Write request
Lease for file
Split ...
Erasure Coding (EC)
• k data blocks + m parity blocks
• Example: Reed-Solomon (6,3)
d d d d d d
Raw
Data
Splitting
d d d d...
EC - Main characteristics
Replication
(Factor 1)
Replication
(Factor 3)
Reed-Solomon
(6,3)
Reed-Solomon
(10,4)
Maximum fau...
EC - Contiguous blocks
• Approach 1: Retain block size and add parity
File A File B File C
128
MB
128
MB
128
MB
128
MB
128...
EC - Striping
• Approach 2: Splitting blocks into smaller cells (1 MB)
File A File B File C
• Pro: Works for small files
•...
• Start from striping to deal with smaller files
EC - Apache Hadoop’s decision (HDFS-7285)
Contiguous
Striping
Replication...
EC - Shell Command
• Create a EC Zone on an empty directory
• All the files under a zone directory are automatically erasu...
EC - Write Path
• Parallel write
• Client writes to 9 data nodes at the same time
• Calculate parity at client, at write t...
EC - Write Failure Handling
• Data node failure
• Client ignores the failed data node and
continues writing
• Write path i...
EC - Read Path
• Read data from 6 data nodes
in parallel
HDFS Client
DataNode 1
…
…
DataNode 6
DataNode 7
DataNode 8
DataN...
EC - Read Failure Handling
• Read data from 6 arbitrary
data nodes in parallel
• Read parity block to reconstruct missing
...
EC - Network behavior
• Pro’s
• Low latency because of parallel read & write
• Good for small file sizes
• Con’s
• Require...
EC - Coder implementation
• Legacy coder
• From Facebook’s HDFS-RAID project
• [Umbrella] HADOOP-11264
• Pure Java coder
•...
EC - Coder performance I
EC - Coder performance II
EC - Coder performance III
• Hadoop 1
• No built-in High Availability
• Needed to solve yourself via e.g. VMware
2+ Name Nodes (HDFS-6440)
• Hadoop 2...
Intra-DataNode Balancer (HDFS-1312)
• Hadoop already has a Balancer between
DataNodes
• Needs to be called manually by des...
YARN
YARN - Scheduling enhancements
• Application priorities within a queue (YARN-1963)
• For example, in queue Marketing Hive ...
YARN - Built-in support for long-running services
• Simplified and first-class support for services (YARN-4692)
• Abstract...
YARN - Resource Isolation & Docker
• Better Resource Isolation
• Support for disk isolation (YARN-2619)
• Support for netw...
YARN - Service Discovery
• Services can run on any YARN node
• Dynamic IP, can change in case of node failures, etc.
• YAR...
YARN - Use the force!
YARN
MapReduce Tez Spark
YARN
MapReduce Tez Spark
YARN - New UI (YARN-3368)
Application Timeline Service v2 (YARN-2928)
Why?
• Scalability & Performance
• Single global instance of Writer/Reader
• L...
Revolution or evolution?
• Major release, incompatible to Hadoop 2
• Main features are Erasure Coding and better
support for long-running services ...
…but it’s not a revolution!
Twitter:
@uweprintz
uwe.seiler@codecentric.de
Mail:
uwe.printz@codecentric.de
Phone
+49 176 1076531
XING:
https://www.xing...
Slide 1: https://unsplash.com/photos/CIXoFys3gsw
Slide 2: Copyright by Uwe Printz
Slide 7: https://unsplash.com/photos/LHl...
Upcoming SlideShare
Loading in …5
×

Hadoop 3.0 - Revolution or evolution?

673 views

Published on

With Hadoop-3.0.0-alpha2 being released in January 2017, it's time to have a closer look at the features and fixes of Hadoop 3.0.

We will have a look at Core Hadoop, HDFS and YARN, and answer the emerging question whether Hadoop 3.0 will be an architectural revolution like Hadoop 2 was with YARN & Co. or will it be more of an evolution adapting to new use cases like IoT, Machine Learning and Deep Learning (TensorFlow)?

Published in: Technology
  • Be the first to comment

Hadoop 3.0 - Revolution or evolution?

  1. 1. Hadoop 3.0 Revolution or evolution? uweprintz
  2. 2. /whoami & /disclaimer
  3. 3. Some Hadoop history Hadoop 2 HDFS Redundant, reliable storage MapReduce Data processing YARN Cluster resource management Hive SQL Spark In-Memory … Oct. 2013 Let there be YARN Apps! Era of Enterprise Hadoop 2006 Hadoop 1 HDFS Redundant, reliable storage MapReduce Cluster resource mgmt. + data processing Let there be batch! Era of Silicon Valley Hadoop Hadoop 3 ? IoT Machine Learning GPU’s TensorFlow Data Science Streaming Data Cloud FPGA’s Artificial Intelligence Kafka Late 2017 Let there be …? Era of ?
  4. 4. Why Hadoop 3.0? • Deprecated APIs can only be removed in major release • Wire-compatibility will be broken • Change of default ports • Hadoop 2.x Client —||—> Hadoop 3.x Server (and vice versa) • Hadoop command scripts rewrite • Big features that need stabilizing major release
  5. 5. What is Hadoop 3.0? 20142010 2011 201320122009 2015 2.2.02.0.0-alpha branch-1 (branch-0.20) 1.0.0 1.1.0 1.2.1 (Stable)0.20.1 0.20.205 0.21.0 New append 0.23.0 branch-2 HDFS Snapshots NFSv3 support HDFS ACLs HDFS Rolling Upgrades RM Automatic Failover 2.6.0 YARN Rolling Upgrades Transparent Encryption Archival Storage 2.7.0 Hadoop 2 Drop JDK6 Support File Truncate API 2016 branch-0.23 Hadoop 3 Hadoop 2 and 3 were diverged 5+ years ago Hadoop 1 (EOL) Source: Akira Ajisaka (with additions by Uwe Printz) 2017 0.22.0 0.23.11 (Final) Security trunk 2.3.0 2.5.0 2.4.0 NameNode Federation , YARN NameNode HA Heterogeneous storage HDFS In-Memory Caching 2.8.0 3.0.0-alpha1 3.0.0-alpha2 2.1.0-beta HDFS Extended attributes Docker Container in Linux ATS 1.5
  6. 6. Hadoop 3.0 in a nutshell • HDFS • Erasure codes • Low-Level Performance enhancements with Intel ISA-L • 2+ NameNodes • Intra-DataNode Balancer • YARN • Better support for long-running services • Improved isolation & Docker support • Scheduler enhancements • Application Timeline Service v2 • New UI • MapReduce • Task-level native optimization • Derive heap-size automatically • DevOps • Drop JDK7 & Move to JDK8 • Change of default ports • Library & Dependency Upgrade • Client-side classpath Isolation • Shell Script Rewrite & ShellDoc • .hadooprc & .hadoop-env • Metrics2 Sink plugin for Kafka
  7. 7. HDFS
  8. 8. HDFS - Current implementation • 3 replicas by default • Tolerate maximum of 2 failures Write request Lease for file Split into blocks Request for data nodes List of data nodes HDFS Client NameNode DataNode 1 DataNode 2 DataNode 3 Write block + checksum • Simple, scalable & robust • 200% space overhead Write Pipeline Write Pipeline Calculate checksum ACKACK ACK Complete!
  9. 9. Erasure Coding (EC) • k data blocks + m parity blocks • Example: Reed-Solomon (6,3) d d d d d d Raw Data Splitting d d d d d d d d d d d d d d d d d d p p p p p p p p p p p p Encoding Store data and parity • Key Points • XOR Coding —> Saves space, slower recovery • Missing or corrupt data will be restored from available data and parity • Parity can be smaller than data
  10. 10. EC - Main characteristics Replication (Factor 1) Replication (Factor 3) Reed-Solomon (6,3) Reed-Solomon (10,4) Maximum fault tolerance 0 2 3 4 Space Efficiency 100 % 33 % 67 % 71 % Data Locality Yes No (Phase 1) / Yes (Phase 2) Write performance High Low Read performance High Medium Recovery costs Low High Pluggable implementation, first choice Storage Tier Hot Warm Cold Frozen Memory/SSD Disk Dense Disk EC 20 x Day 5 x Week 5 x Month 2 x Year
  11. 11. EC - Contiguous blocks • Approach 1: Retain block size and add parity File A File B File C 128 MB 128 MB 128 MB 128 MB 128 MB 128 MB Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 DN 3 DN 2 DN 12 DN 7 DN 5 DN 1 • Pro: Better for locality • Con: Significant overhead for smaller files, always 3 parity blocks needed • Con: Client potentially needs to process GB’s of data for encoding Parity Parity Parity DN 6 DN 4 DN 8 Encoding
  12. 12. EC - Striping • Approach 2: Splitting blocks into smaller cells (1 MB) File A File B File C • Pro: Works for small files • Pro: Allows parallel write • Con: No data locality -> Increased read latency & More complicated recovery process Block 2 Block 3 Block 4 Block 5 Block 6 DN 7 DN 3 DN 4 DN 1 DN 6 DN 12 Stripe 1 Stripe 2 Stripe n Block 1 Block 4 Round-robin … … … … … … Parity DN 8 DN 9 DN 10 Parity Parity Encoding … … …
  13. 13. • Start from striping to deal with smaller files EC - Apache Hadoop’s decision (HDFS-7285) Contiguous Striping Replication Erasure Coding HDFS Facebook f4 Azure Ceph (before Firefly) Lustre Ceph (with Firefly) QFS Phase 1.1 HDFS-7285 Phase 1.2 HDFS-8031 Phase 3 (Future Work) Phase 2 HDFS-8030 Hadoop 3.0.x implements Phase 1.1 and 1.2
  14. 14. EC - Shell Command • Create a EC Zone on an empty directory • All the files under a zone directory are automatically erasure coded • Rename across zones with different EC schemas are disallowed Usage: hdfs erasurecode [generic options] [-getPolicy <path>] [-help [cmd ...]] [-listPolicies] [-setPolicy [-p <policyName>] <path>] -getPolicy <path> : Get erasure coding policy information about at specified path -listPolicies : Get the list of erasure coding policies supported -setPolicy [-p <policyName>] <path> : Set a specified erasure coding policy to a directory Options : -p <policyName> erasure coding policy name to encode files. If not passed the default policy will be used <path> Path to a directory. Under this directory files will be encoded using specified erasure coding policy
  15. 15. EC - Write Path • Parallel write • Client writes to 9 data nodes at the same time • Calculate parity at client, at write time • Durability • Solomon-Reed(6,3) can tolerate max. 3 failures • Visibility • Read is supported for files being written • Consistency • Client can start reading from any 6 of the 9 data nodes • Appendable • Files can be reopened for appending data HDFS Client DataNode 1 … … DataNode 6 DataNode 7 DataNode 8 DataNode 9 1MB Data 1MB Data Parity Parity Parity ACK ACK ACK ACK ACK
  16. 16. EC - Write Failure Handling • Data node failure • Client ignores the failed data node and continues writing • Write path is able to tolerate 3 data node failures • Requires at least 6 data nodes • Missing blocks will be constructed later HDFS Client DataNode 1 … … DataNode 6 DataNode 7 DataNode 8 DataNode 9 1MB Data 1MB Data Parity Parity Parity ACK ACK ACK ACK ACK
  17. 17. EC - Read Path • Read data from 6 data nodes in parallel HDFS Client DataNode 1 … … DataNode 6 DataNode 7 DataNode 8 DataNode 9 1MB Data 1MB Data Block
  18. 18. EC - Read Failure Handling • Read data from 6 arbitrary data nodes in parallel • Read parity block to reconstruct missing data block HDFS Client DataNode 1 … … DataNode 6 DataNode 7 DataNode 8 DataNode 9 1MB Data Block Parity Parity reconstructs
  19. 19. EC - Network behavior • Pro’s • Low latency because of parallel read & write • Good for small file sizes • Con’s • Requires high network bandwidth between client & server • Dead data nodes result in high network traffic and reconstruction time
  20. 20. EC - Coder implementation • Legacy coder • From Facebook’s HDFS-RAID project • [Umbrella] HADOOP-11264 • Pure Java coder • Code improvements over HDFS-RAID • HADOOP-11542 • Intel ISA-L coder • Native coder with Intel’s Intelligent Storage Acceleration Library • Accelerates EC-related linear algebra calculations by exploiting advanced hardware instruction sets like SSE, AVX, and AVX2 • HADOOP-11540
  21. 21. EC - Coder performance I
  22. 22. EC - Coder performance II
  23. 23. EC - Coder performance III
  24. 24. • Hadoop 1 • No built-in High Availability • Needed to solve yourself via e.g. VMware 2+ Name Nodes (HDFS-6440) • Hadoop 2 • High Availability out-of-the-box via Active-Passive Pattern • Needed to recover immediately after failure NameNode Active NameNode Standby • Hadoop 3 • 1 Active NameNode with N Standby NameNodes • Trade-off between operation costs vs. hardware costs NameNode Active NameNode Standby NameNode Standby
  25. 25. Intra-DataNode Balancer (HDFS-1312) • Hadoop already has a Balancer between DataNodes • Needs to be called manually by design • Typically used after adding additional worker nodes • The Disk Balancer lets administrators rebalance data across multiple disks of a DataNode • It is useful to correct skewed data distribution often seen after adding or replacing disks • Adds hdfs diskbalancer that will submit a plan but does not wait for the plan to finish executing and the DataNode will do the moves itself
  26. 26. YARN
  27. 27. YARN - Scheduling enhancements • Application priorities within a queue (YARN-1963) • For example, in queue Marketing Hive jobs > MapReduce jobs • Inter-Queue priorities (YARN-4945) • Queue 1 > Queue 2, irrespective of demand & capacity • Previously based only on unconsumed capacity • Affinity / Anti-Affinity (YARN-1042) • More fine-granular restraints on locations, e.g. do not allocate HBase Region servers and Storm workers on the same host • Global Scheduling (YARN-5139) • Currently YARN scheduling is done one-node-at-a-time at arrival of heart beats and can lead to suboptimal decisions • With global scheduling, YARN scheduler looks at more nodes and selects the best nodes based on application requirements which leads to a globally optimal placement and enhanced container scheduling throughput • Gang Scheduling (YARN-624) • Allow allocation of sets of containers, e.g. 1 container with 128GB of RAM and 16 cores OR 100 containers with 2GB of RAM and 1 core • Can be achieved already by holding on to containers but might lead to deadlocks and decreased cluster utilization
  28. 28. YARN - Built-in support for long-running services • Simplified and first-class support for services (YARN-4692) • Abstract common framework to support long running service (similar to Apache Slider) • More simplified API for managing the service lifecycle of YARN Apps • Better support for long running service • Recognition of long running service (YARN-4725) • Auto-restart of containers • Containers for long running service are retried at same node in case of local state • Service/Application upgrade support (YARN-4726) • Hold on to containers during an upgrade of the YARN App • Dynamic container resizing (YARN-1197) • Only ask for minimum resources at start and rather adjust them at runtime • Currently the only way is releasing containers and allocating new containers with the expected size
  29. 29. YARN - Resource Isolation & Docker • Better Resource Isolation • Support for disk isolation (YARN-2619) • Support for network isolation (YARN-2140) • Uses cgroups to give containers their fair share • Docker support in LinuxContainerExecutor (YARN-3611) • The LinuxContainerExecutor already provides functionality around localization, cgroups based resource management and isolation for CPU, network, disk, etc. as well as security mechanisms • Support Docker containers to be run inside of LinuxContainerExecutor • Offers packaging and resource isolation • Complements YARN’s support for long-running services
  30. 30. YARN - Service Discovery • Services can run on any YARN node • Dynamic IP, can change in case of node failures, etc. • YARN Service Discovery via DNS (YARN-4757) • The YARN service registry already provides facilities for applications to register their endpoints and for clients to discover them but they are only available via Java API and REST • Expose service information via a already available discovery mechanism: DNS • Current YARN Service Registry records need to be converted into DNS entries • Discovery of the container IP and service port via standard DNS lookups • Mapping of Applications, e.g. zkapp1.griduser.yarncluster.com -> 172.17.0.2 • Mapping of containers, e.g. container-e3741-1454001598828-0131-01000004.yarncluster.com -> 172.17.0.3
  31. 31. YARN - Use the force! YARN MapReduce Tez Spark YARN MapReduce Tez Spark
  32. 32. YARN - New UI (YARN-3368)
  33. 33. Application Timeline Service v2 (YARN-2928) Why? • Scalability & Performance • Single global instance of Writer/Reader • Local disk based LevelDB storage • Reliability • Failure handling with local disk • Single point-of-failure • Usability • Add configuration and metrics as first-class members • Better support for queries • Flexibility • Data model is more describable Core Concepts • Distributed write path • Logical per app collector • Separate reader instances • Pluggable backend storage • HBase • Enhanced internal data model • Metrics Aggregation • Richer REST API for queries
  34. 34. Revolution or evolution?
  35. 35. • Major release, incompatible to Hadoop 2 • Main features are Erasure Coding and better support for long-running services & Docker • Good fit for IoT and Deep Learning use cases Summary Release time line • 3.0.0-alpha1 - Sep/3/2016 • Alpha2 - Jan/25/2017 • Alpha3 - Q2 2017 (Estimated) • Beta/GA - Q3/Q4 2017 (Estimated)
  36. 36. …but it’s not a revolution!
  37. 37. Twitter: @uweprintz uwe.seiler@codecentric.de Mail: uwe.printz@codecentric.de Phone +49 176 1076531 XING: https://www.xing.com/profile/Uwe_Printz Thank you!
  38. 38. Slide 1: https://unsplash.com/photos/CIXoFys3gsw Slide 2: Copyright by Uwe Printz Slide 7: https://unsplash.com/photos/LHlwgjbSo3k Slide 34: https://unsplash.com/photos/Cvf1IqUel9w Slide 36: https://imgflip.com/i/mkovb Slide 37: Copyright by Uwe Printz All pictures CC0 or shot by the author

×