Upgrading from HDP 2.1 to HDP 2.2

Upgrading
from HDP2.1 to HDP2.2
2014/12/18
@tagomoris
HadoopSCR #hadoopreading

Satoshi Tagomori (@tagomoris)
LINE Corp.

Analysis2 (CDH4)
Hadoop Cluster Switching
Long running CDH4 cluster
Switching to new cluster
w/ Fast network, Large HDD, Many CPU core
changing Hive table schema/ﬁle formats
No downtime!
MRv1/HDFS
Hive

Distribution Options
Options at Oct 2014
CDH5
HDP2.1
Apache Hadoop Release
Hive 0.13, Tez -> HDP2.1 !

input data
ﬂuent-plugin-webhdfs
Shib
executing queries
over hiveserver1/2
Analysis2 (CDH4)
MRv1/HDFS
Hive

double write
Shib
Analysis2 (CDH4)
MRv1/HDFS
Hive
Analysis3 (HDP2.1)
MRv2/HDFS
Hive
distcp
Nov-Dec 2014

HDP 2.1.5.0
Install over Ansible, w/o Ambari
for conﬁguration versioning
Hadoop 2.4.0
YARN RM-HA + Namenode HA
Hive 0.13
Tez?

Shib
Analysis2 (CDH4)
MRv1/HDFS
Hive
Analysis3 (HDP2.1)
MRv2/HDFS
Hive
Few days later (not yet)

HDP 2.2!
Hadoop 2.6.0
Datanode hot swap drive
YARN ResourceManager REST API
Hive 0.14.0 (...)
Latest Tez

diff HDP2.1 HDP2.2
hadoop-yarn-2.4.0.2.1.5.0-695.el6
-> hadoop-yarn-2.6.0.2.2.0.0-2041.el6
+ hadoop_2_2_0_0_2041-yarn-2.6.0.2.2.0.0-2041.el6
/usr/lib/hadoop/....
-> /usr/hdp/current/hadoop-*

diff HDP2.1 HDP2.2
Toooooooooooooo many diff lines
Companion ﬁles of HDP (2.1 -> 2.2)
in hive-site.xml: 353 -> 1207 lines
in tez-site.xml: 126 -> 261 lines
How to edit/control?
IDE? Editor? KIAI? Excel?

hadoop_xml_diff.rb
http://d.hatena.ne.jp/tagomoris/20141215/1418631988

Upgrade test in test cluster
Automated Upgrade by Ansible playbook
stop hiveserver2
stop cluster
-safemode enter, -saveNamespace
make backup (hdfs metadata, hive metastore)
-ﬁnalizeUpgrade
nm, rm, dn, nn, zkfc, jn, zk
check edits stopped
Upgrade yum repo/packages/conﬁgurations
execute hdp-select
Start cluster
zk, jn
“hdfs namenode -upgrade”

Upgrade in test cluster
Automated Upgrade by Ansible playbook
stop hiveserver2
stop cluster
-ﬁnalizeUpgrade
check edits stopped
execute hdp-select
Start cluster
zk, jn
“hdfs namenode -upgrade” ... ever lasting ...

“Ah, I might make any
mistakes...”

double write
Shib
Analysis2 (CDH4)
MRv1/HDFS
Hive
Analysis3 (HDP2.2)
MRv2/HDFS
Hive
Upgrade HDP 2.1->2.2
Dec 16 2014

Upgrade in analysis3
Manual Procedure!!!
stop hiveserver2
stop cluster
-ﬁnalizeUpgrade
check edits stopped
execute hdp-select
Start cluster
zk, jn
/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh
start namenode -upgrade

2014-12-16 14:53:28,919 INFO namenode.NNUpgradeUtil (NNUpgradeUtil.java:doUpgrade(139)) - Performing upgrade of storage directory /var/hadoop/hdfs/nn
2014-12-16 14:53:28,939 INFO namenode.FSNamesystem (FSNamesystem.java:loadFSImage(1029)) - Need to save fs image? false (staleImage=false, haEnabled=t
2014-12-16 14:53:28,941 INFO namenode.FSEditLog (FSEditLog.java:startLogSegment(1173)) - Starting log segment at 262795139
2014-12-16 14:53:29,224 INFO namenode.NameCache (NameCache.java:initialized(143)) - initialized with 23408 entries 1740524 lookups
2014-12-16 14:53:29,227 INFO namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(748)) - Finished loading FSImage in 15695 msecs
2014-12-16 14:53:29,346 INFO namenode.NameNode (NameNodeRpcServer.java:<init>(329)) - RPC server is binding to master1.local:8020
2014-12-16 14:53:29,348 INFO ipc.CallQueueManager (CallQueueManager.java:<init>(53)) - Using callQueue class java.util.concurrent.LinkedBlockingQueue
2014-12-16 14:53:29,390 INFO ipc.Server (Server.java:run(827)) - IPC Server Responder: starting
2014-12-16 14:53:29,390 INFO ipc.Server (Server.java:run(674)) - IPC Server listener on 8020: starting
2014-12-16 14:53:29,393 INFO namenode.NameNode (NameNode.java:startCommonServices(646)) - NameNode RPC up at: master1.local/10.0.0.0:8020
2014-12-16 14:53:29,393 INFO namenode.FSNamesystem (FSNamesystem.java:startActiveServices(1142)) - Starting services required for active state
2014-12-16 14:53:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(160)) - Starting CacheReplicationMonitor with interva
2014-12-16 14:53:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 13919829439 milliseconds
2014-12-16 14:53:29,576 INFO fs.TrashPolicyDefault (TrashPolicyDefault.java:initialize(92)) - Namenode trash conﬁguration: Deletion interval = 360 minutes, Empt
2014-12-16 14:53:29,576 INFO fs.TrashPolicyDefault (TrashPolicyDefault.java:<init>(247)) - The conﬁgured checkpoint interval is 0 minutes. Using an interval of 3
deletion instead
2014-12-16 14:53:29,584 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 189 m
2014-12-16 14:53:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 mill
2014-12-16 14:54:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 2 mil
2014-12-16 14:57:29,398 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 2 mil

(ever lasting...)
https://gist.github.com/tagomoris/ed7aa8ccb3d6003a29f9

ever lasting!!!!!!!!
${dfs.namenode.name.dir}/current and .../previous are not
modiﬁed anymore in 60 minutes...

rollback
stop all daemons
replace all packages w/ HDP2.1
replace configurations for HDP2.1
/usr/lib/hadoop/sbin/hadoop-daemon.sh --config ... start namenode -rollback
$ /usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start namenode -rollback
starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-4c3bf0834.livedoor.out
"rollBack" will remove the current state of the file system,
returning you to the state prior to initiating your recent.
upgrade. This action is permanent and cannot be undone. If you
are performing a rollback in an HA environment, you should be
certain that no NameNode process is running on any host.Roll back file system state? (Y or N) Invalid input:
Roll back file system state? (Y or N) Invalid input:
$

impossible
I cannot input any “Y”s...

Recovery
replace namenode metadata w/ backup
execute NameNode (HDP 2.1) & DataNode
cluster recovered!!!!

Recovery
Replication numbers of all blocks are
ZERO!!!!!!!1!!!!1!

Recovery
replication numbers of all blocks are
ZERO!!!!!!!1!!!!1!
hadoop fsck / -> all blocks become ﬁne!

Conclusion
I will wait for anyone who uses HDP 2.2...

Upgrading from HDP 2.1 to HDP 2.2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to Upgrading from HDP 2.1 to HDP 2.2

Similar to Upgrading from HDP 2.1 to HDP 2.2 (20)

More from SATOSHI TAGOMORI

More from SATOSHI TAGOMORI (20)

Recently uploaded

Recently uploaded (20)

Upgrading from HDP 2.1 to HDP 2.2