Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
What‘s new in
Hadoop 3.0
Heiko Loewe
@loeweh
2
Tributes
• Zhe Zhang, Erasure Coding
• Akira Ajisaka, Script Rewrite
• Junping Du, Yarn Timeline Service v2
• And manny ...
3
Hadoop 3.0 Roadmap
• Hadoop 3.x Releases
• Planned for hadoop-3.0.0
Classpath isolation on by default HADOOP-11656
• had...
4
Release Feature
2.0NameNode High-Availability,
2.2Federation, Snapshots, NFS v3 Mount
2.3Heterogenous Storage (Phase 1),...
5
What's Apache Hadoop 3?
20142010 2011 201320122009 2015
2.2.0
Heterogeneous storage
HDFS in-memory caching
2.3.0 2.5.0
2...
6
Break compatibility
Major version up is to clean up the code
• Deprecated APIs can be removed only in changing major ver...
7
Erasure Coding
8
Traditional Hadoop
• HDFS inherits 3-way replication from Google File System
• Simple, scalable and robust
• 200% storag...
9
Erasure Coding Saves Storage
• Simplified Example: storing 2 bits
• Same data durability
– can lose any 1 bit
• Pro: Hal...
10
Erasure Coding with m Data, n Parity
Reed Solomon Coding
11
Durability and Efficiency
Data Durability = How many simultaneous failures can be tolerated?
Storage Efficiency = How m...
12
Continious Layout
13
Stripped Layout
14
Other Implementation
15
HDFS Roadmap
16
Erasure Coding: Current Status
 Phase 1: striping layout
 C = 64KB (default)
 Work for small files
 No data locality...
17
Name Server Changes
Mapping Logical and Storage Blocks
Too Many Storage Blocks?
Hierarchical Naming Protocol:
18
Erasure Coding: Write files using (6,3)-Reed-Solomon
・
・
・
・
・
・
・
・
・
・
・
・
 Write data to 9 DNs in parallel
6 Data B...
19
Erasure Coding: Read files
 Read data from 6 DNs in parallel
DN1
ECClient
DN6
DN9
・
・
・
・
・
・
・
・
・
20
Erasure Coding: Read files when DN fails
 Read data from arbitrary 6 DNs in parallel
DN1
ECClient
×DN6
DN7
DN9
・
・
・
・...
21
hdfs erasure command
[loewe@loewe hadoop-3.0.0-alpha1]$ bin/hdfs erasurecode -help
Usage: hdfs erasurecode [generic opt...
22
Acceleration with Intel ISA-L
• 1 legacy coder
– From Facebook’s HDFS-RAID project
• 2 new coders
– Pure Java — code im...
23
Benchmarks
24
Benchmarks
25
Benchmarks
26
Benchmarks
27
Yarn
Timeline Service v2
28
First, A bit of Vision…
• Evolution of Hadoop start with YARN
• YARN Evolution will continue to drive Hadoop forward
• ...
29
Several important trends in age of Hadoop 3.0 +
YARN and Other Platform Services
Storage
Resource
Management Security
S...
30
Yarn Architecture
Yarn
Resource
Database
Scheduler
Application
Timeline Service
31
YARN Process Flow - Walkthrough
NodeManager NodeManager NodeManager NodeManager
Container
1.1
Container
2.4
NodeManager...
32
Why Timeline Service v2
• Scalability and reliability challenges
– Single instance of Timeline Server
– Storage (single...
33
Highlights
v.1 v.2
Single writer/reader Timeline Server Distributed writer/collector architecture
Single local LevelDB ...
34
Architecture
• Separation of writers (“collectors”) and readers
• Distributed collectors: one collector for each app
• ...
35
Distributed Collectors and Readers
36
New Entity Model
• Flows and flow runs as parents of YARN applicaSon enSSes
• First-class configuraSon (key-value pairs...
37
What is a flow
• A flow is a group of YARNapplications
that are launched as parts of a logical
app
• Oozie, Scalding, P...
38
Metrics Aggregation
• Application level
– Rolls up sub-application metrics
– Performed in real time in the collectors
i...
39
More Cloud Friendly
• Elastic
– Dynamic Resource Configuration
• YARN-291
• Allow tune down/up on NM’s resource in runt...
40
More Cloud Friendly (Contd.)
• Isolation
– Embrace container technology to achieve better isolation
– Resource isolatio...
41
• Add a native implementation of the map output collector
– Sort, Spill and IFile serialization
• Prequisites
– Built w...
42
Benchmark
• Release Note in the issue:
– "For shuffle-intensive jobs this may provide speed-ups of 30% or more."
• Benchm...
43
Updated Web UI
44
Setup Timeline Service v2
• Set up the HBase cluster (1.1.x)
– Add the timeline service jar to HBase
– Install the flow...
45
Shell Script rewrite
46
Directory Structure
47
Bin/hadoop Command
48
bin/hdfs Command
49
Shell Script Rewrite (HADOOP-9902)
• Hadoop and Shell Script
– Launching daemons
– Hadoop CLI
• Difficult to understand
–...
50
After rewriting the scripts ...
• Easy to understand
• Because shell API doc is available
– Shelldoc maker generates do...
51
• Very similar to .bashrc
– Read the API doc
– Create your own ~/.hadoopXX
• hadoop-env : hadoop-env.sh for each user
•...
52
--debug option is available
CLASSPATH was overwritten!!
(before HADOOP-13045)
Useful for troubleshooting
$ hadoop --deb...
53
Many new features, bug fixes, improvements
• 'hadoop distch' to change the ownership and permissions on many
files via M...
54
Derive heap size or mapreduce.*.memory.mb
automatically (MAPREDUCE-5785)
• In Hadoop 2, two similar properties must be ...
55
Intra Data-Node Balancer (HDFS-1312)
• Due to activities like deletes
the volumes of a DataNode
may become imbalance fi...
56
Intra Data-Node Balancer (HDFS-1312)
• Offline scripts existed to reblance a DataNode
• HDFS-1312 introduces a online p...
57
Multiple Name Nodes
58
Support more than two NameNodes (HDFS-6440)
• Hadoop 2 now supports only 2 NameNodes
– 1 active and 1 standby
• Hadoop ...
59
Old Layout
ZK Failover
Controller
Active
NameNode
ZK Failover
Controller
Active
NameNode
Zookeeper Zookeeper Zookeeper
...
60
New Layout
ZK Failover
Controller
Active
NameNode
ZK Failover
Controller
Standby
NameNode
Zookeeper Zookeeper Zookeeper...
61
All Feature supported
• Checkpointing with NFS
• Checkpoint with Quorum Journal Daemon
• Manual Failover
• Automatic Fa...
62
Incompatible changes
• Many deprecated APIs will be removed
– hftp/hsftp/s3 -> webhdfs/s3{n,a}
– Metrics v1
– org.apach...
63
Bump up the versions of the libraries
• Drop JDK7 support (HADOOP-11858)
• Dependency Hell
– Tomcat
– Jetty
– Jersey
– ...
64
Classpath isolation (HADOOP-11656)
• Relaxing the "dependency hell"
– Separate client and server jars
– Client jar does...
65Follow me on Twitter: @loeweh
Upcoming SlideShare
Loading in …5
×

What's new in hadoop 3.0

1,404 views

Published on

my compilation of the changes and differences of the upcoming 3.0 version of Hadoop. Present during the Meetup of the group https://www.meetup.com/Big-Data-Hadoop-Spark-NRW/

Published in: Data & Analytics
  • DOWNLOAD FULL. BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Perfect, Heiko. Many thanks!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Which looks like a lot of work but the result are really excellent slides! Thank you Heiko
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

What's new in hadoop 3.0

  1. 1. 1 What‘s new in Hadoop 3.0 Heiko Loewe @loeweh
  2. 2. 2 Tributes • Zhe Zhang, Erasure Coding • Akira Ajisaka, Script Rewrite • Junping Du, Yarn Timeline Service v2 • And manny Others
  3. 3. 3 Hadoop 3.0 Roadmap • Hadoop 3.x Releases • Planned for hadoop-3.0.0 Classpath isolation on by default HADOOP-11656 • hadoop-3.0.0-alpha1 – HADOOP • Move to JDK8+ • Shell script rewrite HADOOP-9902 • Move default ports out of ephemeral range HDFS-9427 – HDFS • Removal of hftp in favor of webhdfs HDFS-5570 • Support for more than two standby NameNodes HDFS-6440 • Support for Erasure Codes in HDFS HDFS-7285 • Intra-datanode balancer HDFS-1312 – YARN • YARN Timeline Service v.2 YARN-2928 – MAPREDUCE • Derive heap size or mapreduce.*.memory.mb automatically MAPREDUCE-5785
  4. 4. 4 Release Feature 2.0NameNode High-Availability, 2.2Federation, Snapshots, NFS v3 Mount 2.3Heterogenous Storage (Phase 1), In Memory Caching 2.4Rolling Upgrades, Posix ACL 2.5Extended Attributes 2.6Hot Swap Volumes, Heterogeneous Storage (Phase 2), transp. Encryption 2.7Files w/ variable Block Length, Inotify 2.8Docker Container in Linux, Yarn ATS 1.5, changing resources on alloc Yarn Container 2.9TBD Release / Features / 2.X
  5. 5. 5 What's Apache Hadoop 3? 20142010 2011 201320122009 2015 2.2.0 Heterogeneous storage HDFS in-memory caching 2.3.0 2.5.0 2.4.0 HDFS ACLs 2.0.0-alpha 2.1.0-beta branch-1 (branch-0.20) 1.0.0 1.1.0 1.2.1(stable)0.20.1 0.20.205 0.22.0 Security 0.23.11(final) NameNode Federation, YARN 0.21.0 New append 0.23.0 NameNode HA branch-2 HDFS Snapshots NFSv3 support Windows HDFS Rolling Upgrades Application History Server RM Automatic Failover 2.6.0 YARN Rolling Upgrades Transparent Encryption Archival Storage 2.7.0 Hadoop2 Drop JDK6 support Truncate API 2016 branch-0.23 trunk Hadoop3 Hadoop 3 and 2 were diverged in 2011 (5 years ago!) Hadoop1 (EOL) Source: Akira Ajisaka
  6. 6. 6 Break compatibility Major version up is to clean up the code • Deprecated APIs can be removed only in changing major version – @Public and @Stable Java API – REST API – Metrics/JMX – CLI – Environment variables • Wire-compatibility can be broken – 2.X client cannot talk to 3.X server and vice versa • Compatibility Guide: – Apache Hadoop 3.0.0-alpha1 – Apache Hadoop Compatibility
  7. 7. 7 Erasure Coding
  8. 8. 8 Traditional Hadoop • HDFS inherits 3-way replication from Google File System • Simple, scalable and robust • 200% storage overhead • Secondary replicas rarely accessed
  9. 9. 9 Erasure Coding Saves Storage • Simplified Example: storing 2 bits • Same data durability – can lose any 1 bit • Pro: Half the storage overhead • Cons: Slower recovery 1 01 0Replication: 2 extra bits XOR Coding: 1 0⊕ 1= 1 extra bit
  10. 10. 10 Erasure Coding with m Data, n Parity Reed Solomon Coding
  11. 11. 11 Durability and Efficiency Data Durability = How many simultaneous failures can be tolerated? Storage Efficiency = How much portion of storage is for useful data? Data Durability Storage Efficiency Single Replica 0 100% 3-way Replication 2 33% XOR with 6 data cells 1 86% RS (6,3) 3 67% RS (10,4) 4 71%
  12. 12. 12 Continious Layout
  13. 13. 13 Stripped Layout
  14. 14. 14 Other Implementation
  15. 15. 15 HDFS Roadmap
  16. 16. 16 Erasure Coding: Current Status  Phase 1: striping layout  C = 64KB (default)  Work for small files  No data locality  Available on trunk  Phase 2: contiguous layout  C = 128MB (= HDFS Block size)  Not work for small files  Data locality  Now in progress (HDFS-8030) Incoming Data DataNode 1 DataNode 2 DataNode 3 DataNode 4 DataNode 5 ・ ・ ・ Cell size (C)
  17. 17. 17 Name Server Changes Mapping Logical and Storage Blocks Too Many Storage Blocks? Hierarchical Naming Protocol:
  18. 18. 18 Erasure Coding: Write files using (6,3)-Reed-Solomon ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・  Write data to 9 DNs in parallel 6 Data Blocks DN1 DN6 DN7 Incoming Data DN9 3 Parity Blocks ECClient
  19. 19. 19 Erasure Coding: Read files  Read data from 6 DNs in parallel DN1 ECClient DN6 DN9 ・ ・ ・ ・ ・ ・ ・ ・ ・
  20. 20. 20 Erasure Coding: Read files when DN fails  Read data from arbitrary 6 DNs in parallel DN1 ECClient ×DN6 DN7 DN9 ・ ・ ・ ・ ・ ・ ・ ・ ・
  21. 21. 21 hdfs erasure command [loewe@loewe hadoop-3.0.0-alpha1]$ bin/hdfs erasurecode -help Usage: hdfs erasurecode [generic options] [-getPolicy <path>] [-help [cmd ...]] [-listPolicies] [-setPolicy [-p <policyName>] <path>] -getPolicy <path> : Get erasure coding policy information about at specified path -help [cmd ...] : Displays help for given command or all commands if none is specified. -listPolicies : Get the list of erasure coding policies supported -setPolicy [-p <policyName>] <path> : Set a specified erasure coding policy to a directory Options : -p <policyName> erasure coding policy name to encode files. If not passed the default policy will be used <path> Path to a directory. Under this directory files will be encoded using specified erasure coding policy
  22. 22. 22 Acceleration with Intel ISA-L • 1 legacy coder – From Facebook’s HDFS-RAID project • 2 new coders – Pure Java — code improvement over HDFS-RAID – Native coder with Intel’s Intelligent Storage Acceleration Library (ISA-L)
  23. 23. 23 Benchmarks
  24. 24. 24 Benchmarks
  25. 25. 25 Benchmarks
  26. 26. 26 Benchmarks
  27. 27. 27 Yarn Timeline Service v2
  28. 28. 28 First, A bit of Vision… • Evolution of Hadoop start with YARN • YARN Evolution will continue to drive Hadoop forward • Hadoop 3 will still use Yarn, but there are a lot of Improvements Hadoop 3
  29. 29. 29 Several important trends in age of Hadoop 3.0 + YARN and Other Platform Services Storage Resource Management Security Service Discovery Management Monitoring Alerts IOT Assembly Kafka Storm HBase Solr Governance MR Tez Spark … Innovating frameworks: Flink, DL(TensorFlow), etc. Various Environments On Premise Private Cloud Public Cloud
  30. 30. 30 Yarn Architecture Yarn Resource Database Scheduler Application Timeline Service
  31. 31. 31 YARN Process Flow - Walkthrough NodeManager NodeManager NodeManager NodeManager Container 1.1 Container 2.4 NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager Container 1.2 Container 1.3 AM 1 Container 2.2 Container 2.1 Container 2.3 AM2 Client2 Yarn Scheduler Yarn Timeline Service Client (Query)
  32. 32. 32 Why Timeline Service v2 • Scalability and reliability challenges – Single instance of Timeline Server – Storage (single local LevelDB instance) • Usability – Flow – Metrics and configuration as first-class citizens – Metrics aggregation up the entity hierarchy
  33. 33. 33 Highlights v.1 v.2 Single writer/reader Timeline Server Distributed writer/collector architecture Single local LevelDB storage* Scalable storage (HBase) v.1 entity model New v.2 entity model No aggregation Metrics aggregation REST API Richer query REST API
  34. 34. 34 Architecture • Separation of writers (“collectors”) and readers • Distributed collectors: one collector for each app • Dedicated RM collector for RM-generated data • Collector discovery via RM • Pluggable storage with HBase as default storage
  35. 35. 35 Distributed Collectors and Readers
  36. 36. 36 New Entity Model • Flows and flow runs as parents of YARN applicaSon enSSes • First-class configuraSon (key-value pairs) • First-class metrics (single-value or Sme series) • Designed to handle mulS-cluster environment out of the box
  37. 37. 37 What is a flow • A flow is a group of YARNapplications that are launched as parts of a logical app • Oozie, Scalding, Pig, etc. – name: – “frequent_visitor_stat” – run id: 1466097809000 – version: “b9b9068”
  38. 38. 38 Metrics Aggregation • Application level – Rolls up sub-application metrics – Performed in real time in the collectors in memory • Flow run level – Rolls up app level metrics – Performed in HBase region servers via coprocessors • Offline aggregation (TBD) – Rolls up on user, queue, and flow offline periodically – Phoenix tables
  39. 39. 39 More Cloud Friendly • Elastic – Dynamic Resource Configuration • YARN-291 • Allow tune down/up on NM’s resource in runtime – Graceful decommissioning of NodeManagers • YARN-914 • Drains a node that’s being decommissioned to allow running containers to finish • Efficient – Support for container resizing • YARN-1197 • Allows applications to change the size of an existing container
  40. 40. 40 More Cloud Friendly (Contd.) • Isolation – Embrace container technology to achieve better isolation – Resource isolation support for disk and network • YARN-2619 (disk), YARN-2140 (network) • Containers get a fair share of disk and network resources using Cgroups – Docker support in LinuxContainerExecutor • YARN-3611 • Support to launch Docker containers alongside process • Packaging and resource isolation • Operation – Container upgrades (YARN-4726) • ”Do an upgrade of my Spark / HBase apps with minimal impact to end-users” – AM Restart With Work Preserving • MAPREDUCE-6608
  41. 41. 41 • Add a native implementation of the map output collector – Sort, Spill and IFile serialization • Prequisites – Built with -Pnative option – Custom writable types and comparators are not supported • Setting <property name="mapreduce.job.map.output.collector.clas s" value="org.apache.hadoop.mapred.nativetask.NativeMapOutputCollectorDelegator"> Task level native optimization (MAPREDUE-2841)
  42. 42. 42 Benchmark • Release Note in the issue: – "For shuffle-intensive jobs this may provide speed-ups of 30% or more." • Benchmarked with 3 slaves (m3.xlarge) – CentOS 7.2 – 3.0.0-SNAPSHOT (revision 5865fe2b) • A very shuffle-intensive wordcount job – Input: 2.6GB (compressed) – Shuffle: 14GB – Output: 10GB
  43. 43. 43 Updated Web UI
  44. 44. 44 Setup Timeline Service v2 • Set up the HBase cluster (1.1.x) – Add the timeline service jar to HBase – Install the flow run coprocessor – Create tables via TimelineSchemaCreator utility • Configure the YARN cluster – Enable Timeline Service v.2 – Add hbase-site.xml for the timeline collector and readers – Start the timeline reader daemon
  45. 45. 45 Shell Script rewrite
  46. 46. 46 Directory Structure
  47. 47. 47 Bin/hadoop Command
  48. 48. 48 bin/hdfs Command
  49. 49. 49 Shell Script Rewrite (HADOOP-9902) • Hadoop and Shell Script – Launching daemons – Hadoop CLI • Difficult to understand – What is the correct env var to set a option • java classpath? • java.library.path? • GC options? – How to add the option to the env var – We have to read almost all the shell scripts! • New CLI is not completely downward compatible – Hadoop 3: bin/hdfs namenode ‐format – Hadoop 2: bin/hadoop namenode –format Apache Hadoop 3.0.0-alpha1 – Apache Hadoop Compatibility
  50. 50. 50 After rewriting the scripts ... • Easy to understand • Because shell API doc is available – Shelldoc maker generates docs from the scripts – Similar to JavaDoc Public API Documentation by Akira Ajisaka build (trunk): http://aajisaka.github.io/hadoop-project/hadoop-project-dist/hadoop-common/UnixShellAPI.html
  51. 51. 51 • Very similar to .bashrc – Read the API doc – Create your own ~/.hadoopXX • hadoop-env : hadoop-env.sh for each user • hadooprc : called after shell env vars are configured • And that's all :) • ex.) Set additional classpath (.hadooprc) hadoop_add_classpath /path/to/my/jar .hadoop-env and .hadooprc (HADOOP-11353, HADOOP-13045)
  52. 52. 52 --debug option is available CLASSPATH was overwritten!! (before HADOOP-13045) Useful for troubleshooting $ hadoop --debug version DEBUG: DEBUG: DEBUG: ( s n i p ) DEBUG: DEBUG: DEBUG: ( s n i p ) DEBUG: hadoop_parse_args: procesiong version hadoop_parse: asking c a l l e r to skip 1 HADOOP_CONF_DIR=/usr/local/hadoop/ e t c / hadoop Applying the u s e r ' s . hadooprc I n i t i a l CLASSPATH=/path/to/my/jar I n i t i a l i z e CLASSPATH I n i t i a l CLASSPATH=/usr/ l o c a l / hadoop/ s h a r e / hadoop/common/lib/*
  53. 53. 53 Many new features, bug fixes, improvements • 'hadoop distch' to change the ownership and permissions on many files via MapReduce job • 'hadoop jnipath' to print java.library.path • 'hadoop --daemon' instead of hadoop-daemon.sh – ex.) hdfs --daemon status namenode – The return code for status is LSB-compatible – hadoop-daemon(s).sh are now deprecated • .out files are now appended (not overwritten) – Allows external log rotation • and many more – see https://issues.apache.org/jira/browse/HADOOP-9902
  54. 54. 54 Derive heap size or mapreduce.*.memory.mb automatically (MAPREDUCE-5785) • In Hadoop 2, two similar properties must be set :( – mapreduce.{map,reduce}.memory.mb • The amount of memory to request from the scheduler for each task (ex. 2048) – mapreduce.{map,reduce}.java.opts • Java options for YARN containers (ex. -Xmx2G) • In Hadoop 3, either is enough – .java.opts is derived from .memory.mb and vice versa • .java.opts = .memory.mb * mapreduce.job.heap.memory-mb.ratio • .memory.mb = .java.opts / mapreduce.job.heap.memory-mb.ratio
  55. 55. 55 Intra Data-Node Balancer (HDFS-1312) • Due to activities like deletes the volumes of a DataNode may become imbalance filled DataNode Block Placing Policies
  56. 56. 56 Intra Data-Node Balancer (HDFS-1312) • Offline scripts existed to reblance a DataNode • HDFS-1312 introduces a online process that rebalances the Volume of a DataNode • „hdfs diskbalancer“ Command
  57. 57. 57 Multiple Name Nodes
  58. 58. 58 Support more than two NameNodes (HDFS-6440) • Hadoop 2 now supports only 2 NameNodes – 1 active and 1 standby • Hadoop 3 supports 2 or more standby NameNodes – provides additional fault-tolerance – avoids multiple standby NNs to checkpoint at the same time – # of standby should be small due to block report (typically 3 or 5 NameNodes)
  59. 59. 59 Old Layout ZK Failover Controller Active NameNode ZK Failover Controller Active NameNode Zookeeper Zookeeper Zookeeper Fencing Monitors the health of the NN Participating in election of the active NN Coordinates transition process Fence the other NN, if it win election The information which NN is active is kept her
  60. 60. 60 New Layout ZK Failover Controller Active NameNode ZK Failover Controller Standby NameNode Zookeeper Zookeeper Zookeeper Fencing Monitors the health of the NN Participating in election of the active NN Coordinates transition process Fence the other NN, if it win election The information which NN is active is kept her ZK Failover Controller Standby NameNode … 4 .. 5
  61. 61. 61 All Feature supported • Checkpointing with NFS • Checkpoint with Quorum Journal Daemon • Manual Failover • Automatic Failover
  62. 62. 62 Incompatible changes • Many deprecated APIs will be removed – hftp/hsftp/s3 -> webhdfs/s3{n,a} – Metrics v1 – org.apache.hadoop.Records – and more • Improved CLI output – 'mapred job -list' shows the job name as well – 'hadoop fs -du' shows the raw disk usage, and aligned more unix-like – and more • Search 'Incompatible change' flag – https://s.apache.org/sMO4
  63. 63. 63 Bump up the versions of the libraries • Drop JDK7 support (HADOOP-11858) • Dependency Hell – Tomcat – Jetty – Jersey – Guava – Log4J – Jackson – And many more common/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/local/hadoop/share/ hadoop/common/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/ common/lib/slf4j-api-1.7.10.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/local/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/common/lib/jsp- api-2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-client-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-json-1.9.jar:!/usr/ local/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-io-2.4.jar:/usr/local/hadoop/ share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/nimbus-jose-jwt-3.9.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/ hadoop/common/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/common/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/htrace- core4-4.0.1-incubating.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/ netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-annotations-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/local! /hadoop/share/hadoop/ common/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/json-smart-1.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/ java-xmlbuilder-0.4.jar:/usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/common/lib/jsch-0.1.51.jar:/usr/local/hadoop/share/hadoop/common/lib/curator- framework-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/httpcore-4.2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/jcip-annotations-1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/ usr/local/hadoop/share/hadoop/common/lib/avro-1.7.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/common/lib/httpclient-4.2.5.jar:/usr/local/hadoop/share/ hadoop/common/lib/hadoop-auth-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/local/hadoop/share/hadoop/common/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/ hadoop/common/lib/gson-2.2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/local/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/commons- net-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/commons- compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/ usr/local/hadoop/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/local/hadoop/share/hadoop/common/lib/asm-3.2.jar:/usr/local/ hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop/share/ hadoop/common/lib/curator-recipes-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/ common/lib/slf4j-log4j12-1.7.10.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-3.0.0-SNAPSHOT-tests.jar:/usr/local/ hadoop/share/hadoop/common/hadoop-nfs-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/hadoop- hdfs-client-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-all-4.1.0.Beta5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/hpack-0.11.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/ xercesImpl-2.9.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/okio-1.4.0.jar:/usr/local/ hadoop/share/hadoop/hdfs/lib/okhttp-2.4.0.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar:/usr/local/ hadoop/share/hadoop/hdfs/hadoop-hdfs-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop- mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-nativetask-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop- mapreduce-client-app-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce- examples-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client- common-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT-tests.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client- shuffle-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javassist-3.18.1-GA.jar:/usr/local/hadoop/share/hadoop/yarn/lib/metrics-core-3.0.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/usr/ local/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/ curator-test-2.7.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/fst-2.24.jar:/usr/local/hadoop/share/hadoop/yarn/lib/objenesis-2.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/local/hadoop/ share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-servle 0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-math-2.2.jar:/usr/local/hadoop/share/hadoop/yarn/ hadoop-yarn-server-applicationhistoryservice-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn- api-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/ share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0- SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/ yarn/hadoop-yarn-server-web-proxy-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-3.0.0- SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn- registry-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-3.0.0- SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-3.0.0- SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-client-3.0.0-SNAPSHOT.jar:/usr/ java/latest/lib/tools.jar
  64. 64. 64 Classpath isolation (HADOOP-11656) • Relaxing the "dependency hell" – Separate client and server jars – Client jar does not pull any third party dependencies • If the isolation is done ... – We can safely upgrade the libraries in server code – In branch-2, the upgrade is incompatible :(
  65. 65. 65Follow me on Twitter: @loeweh

×