Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Apache Hadoop 3.0
Community Update
Sydney, September 2017
Sanjay Radia, Vinod Kumar Vavilapalli
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About.html
Sanjay Radia
Chief Architect, Founder, Hortonworks
Part o...
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Hadoop 3.0
 Lot of content in Trunk that did not make
it to 2.x...
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 3.0
 HDFS: Erasure codes
 YARN:
– Long running servi...
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
 Major changes you should know
before upgrade Hadoop 3.0
– J...
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
 Minimum JDK for Hadoop 3.0.x is JDK8 OOP-11858
– Oracle JDK 7 is E...
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
 Previously, the default ports of multiple Hadoop services were in ...
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Classpath isolation (HADOOP-11656)
 Hadoop leaks lots of dependenc...
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS
Support for Three NameNodes for HA
Intra data node balancer
Cl...
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Current (2.x) HDFS Replication Strategy
 Three replicas by default...
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Erasure Coding
 k data blocks + m parity blocks (k + m)
– Example:...
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Block Reconstruction
 Block reconstruction overhead
– Higher netwo...
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Erasure Coding on Contiguous/Striped Blocks
Two Approaches
 EC on ...
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Erasure Coding Zone
 Create a zone on an empty directory
– Shell c...
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Write Pipeline for Replicated Files
 Write pipeline to datanodes
...
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Parallel Write for EC Files
 Parallel write
– Client writes to a g...
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
EC: Write Failure Handling
 Datanode failure
– Client ignores the ...
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Replication: Slow Writers & Replace Datanode on Failure
 Write pip...
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reading with Parity Blocks
 Parallel read
– Read from 6 Datanodes ...
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
EC implications
 File data is striped across multiple nodes and ra...
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
EC performance – write performance faster with right EC lib
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
EC performance – TPC with no DN killed
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
EC performance - TPC with 2 DN killed
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Erasure coding status
 Massive development effort by the Hadoop co...
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 3.0 – YARN Enhancements
 YARN Scheduling Enhancement...
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scheduling Enhancements
 Application priorities within a queue: YA...
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scheduling Enhancements (Contd.)
 CapacityScheduler improvements
–...
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Drivers for Long Running Services
 Consolidation of Infrastruc...
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Built-in support for long running Service in YARN
 A native YARN f...
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Discovery services in YARN
 Services can run on any YARN node; how...
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
A More Powerful YARN
 Elastic Resource Model
– Dynamic Resource Co...
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
More Powerful YARN (Contd.)
 Resource Isolation
– Resource isolati...
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Apps
Docker on Yarn & YARN on YARN  - YCloud
YARN
MR Tez Sp...
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN New UI (YARN-3368)
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Other YARN work planned in Hadoop 3.X
 Resource profiles (YARN-392...
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Compatibility & Testing
3.0
44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Compatibility
 Preserves wire-compatibility with Hadoop 2 clients
...
45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Testing and validation
 Extended alpha → beta → GA plan designed f...
46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summary : What’s new in Apache Hadoop 3.0?
Storage Optimization
HDF...
47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank you!
Reminder: BoFs on Thursday
Upcoming SlideShare
Loading in …5
×

Apache Hadoop 3.0 Community Update

1,215 views

Published on

Apache Hadoop 3 is coming! As the next major milestone for hadoop and big data, it attracts everyone's attention as showcase several bleeding-edge technologies and significant features across all components of Apache Hadoop: Erasure Coding in HDFS, Docker container support, Apache Slider integration and Native service support, Application Timeline Service version 2, Hadoop library updates and client-side class path isolation, etc. In this talk, first we will update the status of Hadoop 3.0 releasing work in apache community and the feasible path through alpha, beta towards GA. Then we will go deep diving on each new feature, include: development progress and maturity status in Hadoop 3. Last but not the least, as a new major release, Hadoop 3.0 will contain some incompatible API or CLI changes which could be challengeable for downstream projects and existing Hadoop users for upgrade - we will go through these major changes and explore its impact to other projects and users.

Speaker: Sanjay Radia, Founder and Chief Architect, Hortonworks

Published in: Technology
  • Be the first to comment

Apache Hadoop 3.0 Community Update

  1. 1. Apache Hadoop 3.0 Community Update Sydney, September 2017 Sanjay Radia, Vinod Kumar Vavilapalli
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About.html Sanjay Radia Chief Architect, Founder, Hortonworks Part of the original Hadoop team at Yahoo! since 2007 – Chief Architect of Hadoop Core at Yahoo! –Apache Hadoop PMC and Committer Prior Data center automation, virtualization, Java, HA, OSs, File Systems Startup, Sun Microsystems, Inria … Ph.D., University of Waterloo Page 2
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Hadoop 3.0  Lot of content in Trunk that did not make it to 2.x branch  JDK Upgrade – does not truly require bumping major number  Hadoop command scripts rewrite (incompatible)  Big features that need stabilizing major release – Erasure codes  YARN: long running services  Ephemeral Ports (incompatible) Driving Reasons Some features taking advantage of 3.0
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hadoop 3.0  HDFS: Erasure codes  YARN: – Long running services, – Scheduler enhancements, – Isolation & Docker – UI  Lots of Trunk content  JDK8 and newer dependent libraries - 3.0.0-alpha1 - Sep/3/2016 - Alpha2 - Jan/25/2017 - Alpha3 – May/16/2017 - Alpha4 – July/7/2017 - Beta/GA – Q4 2017 (Estimated) Key Takeaways Release Timeline 3.0
  5. 5. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda  Major changes you should know before upgrade Hadoop 3.0 – JDK upgrade – Dependency upgrade – Change on default port for daemon/services – Shell script rewrite  Features – Hadoop Common • Client-Side Classpath Isolation • Shell script rewrite – HDFS/Storage • Erasure Coding • Multiple Standby NameNodes • Intradata balancer • Cloud Storage: Support for Azure Data Lake, S3 consistency & performance – YARN • Support for long running services • Scheduling enhancements: : App / Queue Priorities, global scheduling, placement strategies • New UI • ATS v2 – MAPREDUCE • Task-level native optimization HADOOP-11264
  6. 6. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved  Minimum JDK for Hadoop 3.0.x is JDK8 OOP-11858 – Oracle JDK 7 is EoL at April 2015!!  Moving forward to use new features of JDK8 – Lambda Expressions – starting to use this – Stream API – Security enhancements – Performance enhancement for HashMaps, IO/NIO, etc.  Hadoop’s evolution with JDK upgrades – Hadoop 2.6.x - JDK 6, 7, 8 or later – Hadoop 2.7.x/2.8.x/2.9.x - JDK 7, 8 or later – Hadoop 3.0.x - JDK 8 or later Hadoop Operation - JDK Upgrade
  7. 7. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved  Previously, the default ports of multiple Hadoop services were in the Linux ephemeral port range (32768-61000) – Can conflict with other apps running on the same node – Can cause problem during rolling restart if another app takes the port  New ports: – Namenode ports: 50470  9871, 50070  9870, 8020  9820 – Secondary NN ports: 50091  9869, 50090  9868 – Datanode ports: 50020  9867, 50010  9866, 50475  9865, 50075  9864  KMS service port 16000  9600 Change of Default Ports for Hadoop Services
  8. 8. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Classpath isolation (HADOOP-11656)  Hadoop leaks lots of dependencies onto the application’s classpath ○ Known offenders: Guava, Protobuf, Jackson, Jetty, … ○ Potential conflicts with your app dependencies (No shading)  No separate HDFS client jar means server jars are leaked ● NN, DN libraries pulled even though not needed  HDFS-6200: Split HDFS client into separate JAR  HADOOP-11804: Shaded hadoop- client dependency  YARN-6466: Shade the task umbilical for a clean YARN container environment (ongoing)
  9. 9. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Support for Three NameNodes for HA Intra data node balancer Cloud storage improvements (see afternoon talk) Erasure coding
  10. 10. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Current (2.x) HDFS Replication Strategy  Three replicas by default – 1st replica on local node, local rack or random node – 2nd and 3rd replicas on the same remote rack – Reliability: tolerate 2 failures  Good data locality, local shortcut  Multiple copies => Parallel IO for parallel compute  Very Fast block recovery and node recovery – Parallel recover - the bigger the cluster the faster – 10TB Node recovery 30sec to a few hours  3/x storage overhead vs 1.4-1.6 of Erasure Code – Remember that Hadoop’s JBod is very cheap • 1/10 - 1/20 of SANs • 1/10 – 1/5 of NFS r1 Rack I DataNode r2 Rack II DataNode r3
  11. 11. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Erasure Coding  k data blocks + m parity blocks (k + m) – Example: Reed-Solomon 6+3  Reliability: tolerate m failures  Save disk space  Save I/O bandwidth on the write path  1.5x storage overhead  Tolerate any 3 failures b3b1 b2 P1b6b4 b5 P2 P3 6 data blocks 3 parity blocks 3-replication (6, 3) Reed-Solomon Maximum fault Tolerance 2 3 Disk usage (N byte of data) 3N 1.5N
  12. 12. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Block Reconstruction  Block reconstruction overhead – Higher network bandwidth cost – Extra CPU overhead • Local Reconstruction Codes (LRC), Hitchhiker Huang et al. Erasure Coding in Windows Azure Storage. USENIX ATC'12. Sathiamoorthy et al. XORing elephants: novel erasure codes for big data. VLDB 2013. Rashmi et al. A "Hitchhiker's" Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers. SIGCOMM'14. b4 Rack b2 Rack b3 Rack b1 Rack b6 Rack b5 Rack RackRack P1 P2 Rack P3
  13. 13. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Erasure Coding on Contiguous/Striped Blocks Two Approaches  EC on contiguous blocks – Pros: Better for locality – Cons: small files cannot be handled  EC on striped blocks – Pros: Leverage multiple disks in parallel – Pros: Works for small small files – Cons: No data locality for readers C1 C2 C3 C4 C5 C6 PC1 PC2 PC3 C7 C8 C9 C10 C11 C12 PC4 PC5 PC6 stripe 1 stripe 2 stripe n b1 b2 b3 b4 b5 b6 P1 P2 P3 6 Data Blocks 3 Parity Blocks b3b1 b2 b6b4 b5 File f1 P1 P2 P3 parity blocks File f2 f3 data blocks
  14. 14. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Erasure Coding Zone  Create a zone on an empty directory – Shell command: hdfs erasurecode –createZone [-s <schemaName>] <path>  All the files under a zone directory are automatically erasure coded – Rename across zones with different EC schemas are disallowed
  15. 15. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Write Pipeline for Replicated Files  Write pipeline to datanodes  Durability – Use 3 replicas to tolerate maximum 2 failures  Visibility – Read is supported for being written files – Data can be made visible by hflush/hsync  Consistency – Client can start reading from any replica and failover to any other replica to read the same data  Appendable – Files can be reopened for append * DN = DataNode DN1 DN2 DN3 data data ackack Writer data ack
  16. 16. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Parallel Write for EC Files  Parallel write – Client writes to a group of 9 datanodes at the same time – Calculate Parity bits at client side, at Write Time  Durability – (6, 3)-Reed-Solomon can tolerate maximum 3 failures  Visibility (Same as replicated files) – Read is supported for being written files – Data can be made visible by hflush/hsync  Consistency – Client can start reading from any 6 of the 9 replicas – When reading from a datanode fails, client can failover to any other remaining replica to read the same data.  Appendable (Same as replicated files) – Files can be reopened for append DN1 DN6 DN7 data parit y ack ackWriter data ack DN9 parit yack …… Stipe size 1MB
  17. 17. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved EC: Write Failure Handling  Datanode failure – Client ignores the failed datanode and continue writing. – Able to tolerate 3 failures. – Require at least 6 datanodes. – Missing blocks will be reconstructed later. DN1 DN6 DN7 data parit y ack ackWriter data ack DN9 parit yack ……
  18. 18. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Replication: Slow Writers & Replace Datanode on Failure  Write pipeline for replicated files – Datanode can be replaced in case of failure.  Slow writers – A write pipeline may last for a long time – The probability of datanode failures increases over time. – Need to replace datanode on failure.  EC files – Do not support replace-datanode-on-failure. – Slow writer improved DN1 DN4 data ack DN3DN2 data ack Writer data ack
  19. 19. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Reading with Parity Blocks  Parallel read – Read from 6 Datanodes with data blocks – Support both stateful read and pread  Block reconstruction – Read parity blocks to reconstruct missing blocks DN3 DN7 DN1 DN2 Reader DN4 DN5 DN6 Block3 reconstruct Block2 Block1 Block4 Block5 Block6Parity1
  20. 20. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved EC implications  File data is striped across multiple nodes and racks  Reads and writes are remote and cross-rack  Reconstruction is network-intensive, reads m blocks cross-rack – Need fast network • Require high network bandwidth between client-server • Dead DataNode implies high network traffic and reconstruction time  Important to use optimized ISA-L for performance – 1+ GB/s encode/decode speed, much faster than Java implementation – CPU is no longer a bottleneck  Need to combine data into larger files to avoid an explosion in replica count – Bad: 1x1GB file -> RS(10,4) -> 14x100MB EC blocks (4.6x # replicas) – Good: 10x1GB file -> RS(10,4) -> 14x1GB EC blocks (0.46x # replicas) Works best for archival / cold data usecases Need Fast Network
  21. 21. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved EC performance – write performance faster with right EC lib
  22. 22. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved EC performance – TPC with no DN killed
  23. 23. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved EC performance - TPC with 2 DN killed
  24. 24. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Erasure coding status  Massive development effort by the Hadoop community ○ 20+ contributors from many companies (Hortonworks, Y! JP, Cloudera, Intel, Huawei, …) ○ 100s of commits over three years (started in 2014)  Erasure coding is feature complete!  Solidifying some user APIs in preparation for beta1  Current focus is on testing and integration efforts ○ Want the complete Hadoop stack to work with HDFS erasure coding enabled ○ Stress / endurance testing to ensure stability
  25. 25. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hadoop 3.0 – YARN Enhancements  YARN Scheduling Enhancements  Support for Long Running Services  Re-architecture for YARN Timeline Service - ATS v2  Better elasticity and resource utilization  Better resource isolation and Docker!!  Better User Experiences  Other Enhancements 3.0
  26. 26. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scheduling Enhancements  Application priorities within a queue: YARN-1963 – In Queue A, App1 > App 2  Inter-Queue priorities – Q1 > Q2 irrespective of demand / capacity – Previously based on unconsumed capacity  Affinity / anti-affinity: YARN-1042 – More restraints on locations • Affinity to rack (where you have your sibling) • Anti-affinity (e.g. Hbase region servers)  Global Scheduling: YARN-5139 – Get rid of scheduling triggered on node heartbeat – Replaced with global scheduler that has parallel threads • Globally optimal placement –expect evolution of the scheduler • Critical for long running services – they stick to the allocation – better be a good one • Enhanced container scheduling throughput (6x)
  27. 27. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scheduling Enhancements (Contd.)  CapacityScheduler improvements – Queue Management Improvements • More Dynamic Queue reconfiguration • REST API support for queue management – Absolute resource configuration support – Priority Support in Application and Queue – Preemption improvements • Inter-Queue preemption support
  28. 28. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Drivers for Long Running Services  Consolidation of Infrastructure – Hadoop clusters have a lot of compute and storage resources (some unused) • Can’t I use Hadoop’s resources for non-Hadoop load? • Openstack is hard to manage/operate, can I use YARN? • VMs are expensive, can I use YARN? • But does it support Docker? – yes, we heard you  Hadoop related Data Services that run outside a Hadoop cluster – Why can’t I run them in the Hadoop cluster  Run Hadoop services (Hive, HBase) on YARN – Run Multiple instances – Benefit from YARN’s Elasticity and resource management
  29. 29. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Built-in support for long running Service in YARN  A native YARN framework. YARN-4692 • Abstract common Framework (Similar to Slider) to support long running service • More simplified API (to manage service lifecycle) • Better support for long running service  Recognition of long running service • Affect the policy of preemption, container reservation, etc. • Auto-restart of containers • Containers for long running service are restarted on same node in case of local state  Service/application upgrade support – YARN-4726 • In general, services are expected to run long enough to cross versions  Dynamic container configuration • Only ask for resources just enough, but adjust them at runtime (memory harder)
  30. 30. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Discovery services in YARN  Services can run on any YARN node; how do get its IP? – It can also move due to node failure  YARN Service Discovery via DNS: YARN-4757 – Expose existing service information in YARN registry via DNS • YARN service registry’s records will be converted into DNS entries – Discovery of container IP and service port via standard DNS lookups. • Application – zkapp1.user1.yarncluster.com -> 192.168.10.11:8080 • Container – Container 1454001598828-0001-01-00004.yarncluster.com -> 192.168.10.18
  31. 31. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved A More Powerful YARN  Elastic Resource Model – Dynamic Resource Configuration (YARN-291) • Allow tune down/up on NM’s resource in runtime – E.g. Helps when Hadoop cluster nodes are shared with other workloads – E.g. Hadoop-on-Hadoop allows flexible resource allocation – Graceful decommissioning of NodeManagers (YARN-914) • Drains a node that’s being decommissioned to allow running containers to finish • E.g. Removing a node for maintenance, Spot pricing on cloud, …  Efficient Resource Utilization – Support for container resizing (YARN-1197) • Allows applications to change the size of an existing container • E.g. long running services
  32. 32. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved More Powerful YARN (Contd.)  Resource Isolation – Resource isolation support for disk and network • YARN-2619 (disk), YARN-2140 (network) • Containers get a fair share of disk and network resources using Cgroups – Docker support in LinuxContainerExecutor (YARN-3611) • Support to launch Docker containers alongside process • Packaging and resource isolation – Packing easier e.g. TensorFlow • Complements YARN’s support for long running services
  33. 33. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Apps Docker on Yarn & YARN on YARN  - YCloud YARN MR Tez Spark TensorFlow YARN MR Tez Spark Can use Yarn to test Hadoop!!
  34. 34. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN New UI (YARN-3368)
  35. 35. 42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Other YARN work planned in Hadoop 3.X  Resource profiles (YARN-3926) – Users can specify resource profile name instead of individual resources – Resource types read via a config file  YARN federation (YARN-2915) – Allows YARN to scale out to tens of thousands of nodes – Cluster of clusters which appear as a single cluster to an end user
  36. 36. 43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Compatibility & Testing 3.0
  37. 37. 44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Compatibility  Preserves wire-compatibility with Hadoop 2 clients ○ Impossible to coordinate upgrading off-cluster Hadoop clients  Will support rolling upgrade from Hadoop 2 to Hadoop 3 ○ Can’t take downtime to upgrade a business-critical cluster  Not fully preserving API compatibility! ○ Dependency version bumps ○ Removal of deprecated APIs and tools ○ Shell script rewrite, rework of Hadoop tools scripts ○ Incompatible bug fixes
  38. 38. 45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Testing and validation  Extended alpha → beta → GA plan designed for stabilization  EC already has some usagein production (700 nodes at Y! JP) – Hortonworks has worked closely with this very large customer  Hortonworks is integrating and testing HDP 3 – Integrating with all components of HDP stack – HDP2 ++ integration tests  Cloudera is also testing Hadoop 3 as part of their stack  Plans for extensive HDFS EC testing by Hortonworks and Cloudera  Happy synergy between 2.8.x and 3.0.x lines – Shares much of the same code, fixes flow into both – Yahoo! Deployments based on 2.8.0
  39. 39. 46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Summary : What’s new in Apache Hadoop 3.0? Storage Optimization HDFS: Erasure codes Improved Utilization YARN: Long Running Services YARN: Schedule Enhancements Additional Workloads YARN: Docker & Isolation Easier to Use New User Interface Refactor Base Lots of Trunk content JDK8 and newer dependent libraries 3.0
  40. 40. 47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank you! Reminder: BoFs on Thursday

×