SlideShare a Scribd company logo
Upgrading
from HDP2.1 to HDP2.2
2014/12/18
@tagomoris
HadoopSCR #hadoopreading
Satoshi Tagomori (@tagomoris)
LINE Corp.
Analysis2 (CDH4)
Hadoop Cluster Switching
Long running CDH4 cluster
Switching to new cluster
w/ Fast network, Large HDD, Many CPU core
changing Hive table schema/file formats
No downtime!
MRv1/HDFS
Hive
Distribution Options
Options at Oct 2014
CDH5
HDP2.1
Apache Hadoop Release
Hive 0.13, Tez -> HDP2.1 !
input data
fluent-plugin-webhdfs
Shib
executing queries
over hiveserver1/2
Analysis2 (CDH4)
MRv1/HDFS
Hive
double write
Shib
Analysis2 (CDH4)
MRv1/HDFS
Hive
Analysis3 (HDP2.1)
MRv2/HDFS
Hive
distcp
Nov-Dec 2014
HDP 2.1.5.0
Install over Ansible, w/o Ambari
for configuration versioning
Hadoop 2.4.0
YARN RM-HA + Namenode HA
Hive 0.13
Tez?
Shib
Analysis2 (CDH4)
MRv1/HDFS
Hive
Analysis3 (HDP2.1)
MRv2/HDFS
Hive
Few days later (not yet)
HDP 2.2!
Hadoop 2.6.0
Datanode hot swap drive
YARN ResourceManager REST API
Hive 0.14.0 (...)
Latest Tez
diff HDP2.1 HDP2.2
hadoop-yarn-2.4.0.2.1.5.0-695.el6
-> hadoop-yarn-2.6.0.2.2.0.0-2041.el6
+ hadoop_2_2_0_0_2041-yarn-2.6.0.2.2.0.0-2041.el6
/usr/lib/hadoop/....
-> /usr/hdp/current/hadoop-*
diff HDP2.1 HDP2.2
Toooooooooooooo many diff lines
Companion files of HDP (2.1 -> 2.2)
in hive-site.xml: 353 -> 1207 lines
in tez-site.xml: 126 -> 261 lines
How to edit/control?
IDE? Editor? KIAI? Excel?
hadoop_xml_diff.rb
http://d.hatena.ne.jp/tagomoris/20141215/1418631988
Upgrade test in test cluster
Automated Upgrade by Ansible playbook
stop hiveserver2
stop cluster
-safemode enter, -saveNamespace
make backup (hdfs metadata, hive metastore)
-finalizeUpgrade
nm, rm, dn, nn, zkfc, jn, zk
check edits stopped
Upgrade yum repo/packages/configurations
execute hdp-select
Start cluster
zk, jn
“hdfs namenode -upgrade”
Upgrade in test cluster
Automated Upgrade by Ansible playbook
stop hiveserver2
stop cluster
-safemode enter, -saveNamespace
make backup (hdfs metadata, hive metastore)
-finalizeUpgrade
nm, rm, dn, nn, zkfc, jn, zk
check edits stopped
Upgrade yum repo/packages/configurations
execute hdp-select
Start cluster
zk, jn
“hdfs namenode -upgrade” ... ever lasting ...
“Ah, I might make any
mistakes...”
double write
Shib
Analysis2 (CDH4)
MRv1/HDFS
Hive
Analysis3 (HDP2.2)
MRv2/HDFS
Hive
Upgrade HDP 2.1->2.2
Dec 16 2014
Upgrade in analysis3
Manual Procedure!!!
stop hiveserver2
stop cluster
-safemode enter, -saveNamespace
make backup (hdfs metadata, hive metastore)
-finalizeUpgrade
nm, rm, dn, nn, zkfc, jn, zk
check edits stopped
Upgrade yum repo/packages/configurations
execute hdp-select
Start cluster
zk, jn
/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh 
start namenode -upgrade
2014-12-16 14:53:28,919 INFO namenode.NNUpgradeUtil (NNUpgradeUtil.java:doUpgrade(139)) - Performing upgrade of storage directory /var/hadoop/hdfs/nn
2014-12-16 14:53:28,939 INFO namenode.FSNamesystem (FSNamesystem.java:loadFSImage(1029)) - Need to save fs image? false (staleImage=false, haEnabled=t
2014-12-16 14:53:28,941 INFO namenode.FSEditLog (FSEditLog.java:startLogSegment(1173)) - Starting log segment at 262795139
2014-12-16 14:53:29,224 INFO namenode.NameCache (NameCache.java:initialized(143)) - initialized with 23408 entries 1740524 lookups
2014-12-16 14:53:29,227 INFO namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(748)) - Finished loading FSImage in 15695 msecs
2014-12-16 14:53:29,346 INFO namenode.NameNode (NameNodeRpcServer.java:<init>(329)) - RPC server is binding to master1.local:8020
2014-12-16 14:53:29,348 INFO ipc.CallQueueManager (CallQueueManager.java:<init>(53)) - Using callQueue class java.util.concurrent.LinkedBlockingQueue
2014-12-16 14:53:29,390 INFO ipc.Server (Server.java:run(827)) - IPC Server Responder: starting
2014-12-16 14:53:29,390 INFO ipc.Server (Server.java:run(674)) - IPC Server listener on 8020: starting
2014-12-16 14:53:29,393 INFO namenode.NameNode (NameNode.java:startCommonServices(646)) - NameNode RPC up at: master1.local/10.0.0.0:8020
2014-12-16 14:53:29,393 INFO namenode.FSNamesystem (FSNamesystem.java:startActiveServices(1142)) - Starting services required for active state
2014-12-16 14:53:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(160)) - Starting CacheReplicationMonitor with interva
2014-12-16 14:53:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 13919829439 milliseconds
2014-12-16 14:53:29,576 INFO fs.TrashPolicyDefault (TrashPolicyDefault.java:initialize(92)) - Namenode trash configuration: Deletion interval = 360 minutes, Empt
2014-12-16 14:53:29,576 INFO fs.TrashPolicyDefault (TrashPolicyDefault.java:<init>(247)) - The configured checkpoint interval is 0 minutes. Using an interval of 3
deletion instead
2014-12-16 14:53:29,584 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 189 m
2014-12-16 14:53:59,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds
2014-12-16 14:53:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 mill
2014-12-16 14:54:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds
2014-12-16 14:54:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 2 mil
2014-12-16 14:54:59,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds
2014-12-16 14:54:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 mill
2014-12-16 14:55:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds
2014-12-16 14:55:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 mill
2014-12-16 14:55:59,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30001 milliseconds
2014-12-16 14:55:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 mill
2014-12-16 14:56:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds
2014-12-16 14:56:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 mill
2014-12-16 14:56:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds
2014-12-16 14:56:59,398 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 mill
2014-12-16 14:57:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds
2014-12-16 14:57:29,398 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 2 mil
2014-12-16 14:57:59,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds
2014-12-16 14:57:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 mill
 
(ever lasting...)
https://gist.github.com/tagomoris/ed7aa8ccb3d6003a29f9
ever lasting!!!!!!!!
${dfs.namenode.name.dir}/current and .../previous are not
modified anymore in 60 minutes...
rollback
stop all daemons
replace all packages w/ HDP2.1
replace configurations for HDP2.1
/usr/lib/hadoop/sbin/hadoop-daemon.sh --config ... start namenode -rollback
$ /usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start namenode -rollback
starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-4c3bf0834.livedoor.out
"rollBack" will remove the current state of the file system,
returning you to the state prior to initiating your recent.
upgrade. This action is permanent and cannot be undone. If you
are performing a rollback in an HA environment, you should be
certain that no NameNode process is running on any host.Roll back file system state? (Y or N) Invalid input:
Roll back file system state? (Y or N) Invalid input:
Roll back file system state? (Y or N) Invalid input:
Roll back file system state? (Y or N) Invalid input:
Roll back file system state? (Y or N) Invalid input:
Roll back file system state? (Y or N) Invalid input:
$
impossible
I cannot input any “Y”s...
Recovery
replace namenode metadata w/ backup
execute NameNode (HDP 2.1) & DataNode
cluster recovered!!!!
Recovery
replace namenode metadata w/ backup
execute NameNode (HDP 2.1) & DataNode
cluster recovered!!!!
Replication numbers of all blocks are
ZERO!!!!!!!1!!!!1!
Recovery
replace namenode metadata w/ backup
execute NameNode (HDP 2.1) & DataNode
cluster recovered!!!!
replication numbers of all blocks are
ZERO!!!!!!!1!!!!1!
hadoop fsck / -> all blocks become fine!
Conclusion
I will wait for anyone who uses HDP 2.2...

More Related Content

What's hot

Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
mundlapudi
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best Practices
Cloudera, Inc.
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
Edureka!
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
DataWorks Summit
 

What's hot (20)

Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Sparkstreaming with kafka and h base at scale (1)
Sparkstreaming with kafka and h base at scale (1)Sparkstreaming with kafka and h base at scale (1)
Sparkstreaming with kafka and h base at scale (1)
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
Quantcast File System (QFS) - Alternative to HDFS
Quantcast File System (QFS) - Alternative to HDFSQuantcast File System (QFS) - Alternative to HDFS
Quantcast File System (QFS) - Alternative to HDFS
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learned
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learnedTom Kraljevic presents H2O on Hadoop- how it works and what we've learned
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learned
 
Hadoop
HadoopHadoop
Hadoop
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best Practices
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_Plan
 
ha_module5
ha_module5ha_module5
ha_module5
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
ELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log system
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at Continuent
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 

Viewers also liked

Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
Sadayuki Furuhashi
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Takahiro Inoue
 

Viewers also liked (13)

Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
 
20120423 hbase勉強会
20120423 hbase勉強会20120423 hbase勉強会
20120423 hbase勉強会
 
HBase Meetup Tokyo Summer 2015 #hbasejp
HBase Meetup Tokyo Summer 2015 #hbasejpHBase Meetup Tokyo Summer 2015 #hbasejp
HBase Meetup Tokyo Summer 2015 #hbasejp
 
HBaseサポート最前線 #hbase_ca
HBaseサポート最前線 #hbase_caHBaseサポート最前線 #hbase_ca
HBaseサポート最前線 #hbase_ca
 
まだ間に合う HBaseCon2016
まだ間に合う HBaseCon2016まだ間に合う HBaseCon2016
まだ間に合う HBaseCon2016
 
Apache Kylinについて #hcj2016
Apache Kylinについて #hcj2016Apache Kylinについて #hcj2016
Apache Kylinについて #hcj2016
 
Kafka 0.10.0 アップデート、プロダクション100ノードでやってみた #yjdsnight
Kafka 0.10.0 アップデート、プロダクション100ノードでやってみた #yjdsnightKafka 0.10.0 アップデート、プロダクション100ノードでやってみた #yjdsnight
Kafka 0.10.0 アップデート、プロダクション100ノードでやってみた #yjdsnight
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
噛み砕いてKafka Streams #kafkajp
噛み砕いてKafka Streams #kafkajp噛み砕いてKafka Streams #kafkajp
噛み砕いてKafka Streams #kafkajp
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
 
AWS Black Belt Tech シリーズ 2015 - Amazon Elastic MapReduce
AWS Black Belt Tech シリーズ 2015 - Amazon Elastic MapReduceAWS Black Belt Tech シリーズ 2015 - Amazon Elastic MapReduce
AWS Black Belt Tech シリーズ 2015 - Amazon Elastic MapReduce
 
HBase×Impalaで作るアドテク 「GMOプライベートDMP」@HBaseMeetupTokyo2015Summer
HBase×Impalaで作るアドテク「GMOプライベートDMP」@HBaseMeetupTokyo2015SummerHBase×Impalaで作るアドテク「GMOプライベートDMP」@HBaseMeetupTokyo2015Summer
HBase×Impalaで作るアドテク 「GMOプライベートDMP」@HBaseMeetupTokyo2015Summer
 

Similar to Upgrading from HDP 2.1 to HDP 2.2

Setup and run hadoop distrubution file system example 2.2
Setup and run hadoop  distrubution file system example  2.2Setup and run hadoop  distrubution file system example  2.2
Setup and run hadoop distrubution file system example 2.2
Mounir Benhalla
 

Similar to Upgrading from HDP 2.1 to HDP 2.2 (20)

Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky HaryadiPGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
 
Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recovery
 
Oracle cluster installation with grid and iscsi
Oracle cluster  installation with grid and iscsiOracle cluster  installation with grid and iscsi
Oracle cluster installation with grid and iscsi
 
Oracle cluster installation with grid and nfs
Oracle cluster  installation with grid and nfsOracle cluster  installation with grid and nfs
Oracle cluster installation with grid and nfs
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
20140708hcj
20140708hcj20140708hcj
20140708hcj
 
Hadoop completereference
Hadoop completereferenceHadoop completereference
Hadoop completereference
 
Run wordcount job (hadoop)
Run wordcount job (hadoop)Run wordcount job (hadoop)
Run wordcount job (hadoop)
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS files
 
Setup and run hadoop distrubution file system example 2.2
Setup and run hadoop  distrubution file system example  2.2Setup and run hadoop  distrubution file system example  2.2
Setup and run hadoop distrubution file system example 2.2
 
Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an example
 
An example Hadoop Install
An example Hadoop InstallAn example Hadoop Install
An example Hadoop Install
 
Hadoop installation
Hadoop installationHadoop installation
Hadoop installation
 

More from SATOSHI TAGOMORI

More from SATOSHI TAGOMORI (20)

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/Operations
 
Maccro Strikes Back
Maccro Strikes BackMaccro Strikes Back
Maccro Strikes Back
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of Ruby
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script Confusing
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in Ruby
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive Operations
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: Bigdam
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd Season
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT To
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In Ruby
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real World
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud Service
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

Upgrading from HDP 2.1 to HDP 2.2

  • 1. Upgrading from HDP2.1 to HDP2.2 2014/12/18 @tagomoris HadoopSCR #hadoopreading
  • 3. Analysis2 (CDH4) Hadoop Cluster Switching Long running CDH4 cluster Switching to new cluster w/ Fast network, Large HDD, Many CPU core changing Hive table schema/file formats No downtime! MRv1/HDFS Hive
  • 4. Distribution Options Options at Oct 2014 CDH5 HDP2.1 Apache Hadoop Release Hive 0.13, Tez -> HDP2.1 !
  • 5. input data fluent-plugin-webhdfs Shib executing queries over hiveserver1/2 Analysis2 (CDH4) MRv1/HDFS Hive
  • 6. double write Shib Analysis2 (CDH4) MRv1/HDFS Hive Analysis3 (HDP2.1) MRv2/HDFS Hive distcp Nov-Dec 2014
  • 7. HDP 2.1.5.0 Install over Ansible, w/o Ambari for configuration versioning Hadoop 2.4.0 YARN RM-HA + Namenode HA Hive 0.13 Tez?
  • 9. HDP 2.2! Hadoop 2.6.0 Datanode hot swap drive YARN ResourceManager REST API Hive 0.14.0 (...) Latest Tez
  • 10. diff HDP2.1 HDP2.2 hadoop-yarn-2.4.0.2.1.5.0-695.el6 -> hadoop-yarn-2.6.0.2.2.0.0-2041.el6 + hadoop_2_2_0_0_2041-yarn-2.6.0.2.2.0.0-2041.el6 /usr/lib/hadoop/.... -> /usr/hdp/current/hadoop-*
  • 11. diff HDP2.1 HDP2.2 Toooooooooooooo many diff lines Companion files of HDP (2.1 -> 2.2) in hive-site.xml: 353 -> 1207 lines in tez-site.xml: 126 -> 261 lines How to edit/control? IDE? Editor? KIAI? Excel?
  • 13. Upgrade test in test cluster Automated Upgrade by Ansible playbook stop hiveserver2 stop cluster -safemode enter, -saveNamespace make backup (hdfs metadata, hive metastore) -finalizeUpgrade nm, rm, dn, nn, zkfc, jn, zk check edits stopped Upgrade yum repo/packages/configurations execute hdp-select Start cluster zk, jn “hdfs namenode -upgrade”
  • 14. Upgrade in test cluster Automated Upgrade by Ansible playbook stop hiveserver2 stop cluster -safemode enter, -saveNamespace make backup (hdfs metadata, hive metastore) -finalizeUpgrade nm, rm, dn, nn, zkfc, jn, zk check edits stopped Upgrade yum repo/packages/configurations execute hdp-select Start cluster zk, jn “hdfs namenode -upgrade” ... ever lasting ...
  • 15. “Ah, I might make any mistakes...”
  • 16. double write Shib Analysis2 (CDH4) MRv1/HDFS Hive Analysis3 (HDP2.2) MRv2/HDFS Hive Upgrade HDP 2.1->2.2 Dec 16 2014
  • 17. Upgrade in analysis3 Manual Procedure!!! stop hiveserver2 stop cluster -safemode enter, -saveNamespace make backup (hdfs metadata, hive metastore) -finalizeUpgrade nm, rm, dn, nn, zkfc, jn, zk check edits stopped Upgrade yum repo/packages/configurations execute hdp-select Start cluster zk, jn /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start namenode -upgrade
  • 18. 2014-12-16 14:53:28,919 INFO namenode.NNUpgradeUtil (NNUpgradeUtil.java:doUpgrade(139)) - Performing upgrade of storage directory /var/hadoop/hdfs/nn 2014-12-16 14:53:28,939 INFO namenode.FSNamesystem (FSNamesystem.java:loadFSImage(1029)) - Need to save fs image? false (staleImage=false, haEnabled=t 2014-12-16 14:53:28,941 INFO namenode.FSEditLog (FSEditLog.java:startLogSegment(1173)) - Starting log segment at 262795139 2014-12-16 14:53:29,224 INFO namenode.NameCache (NameCache.java:initialized(143)) - initialized with 23408 entries 1740524 lookups 2014-12-16 14:53:29,227 INFO namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(748)) - Finished loading FSImage in 15695 msecs 2014-12-16 14:53:29,346 INFO namenode.NameNode (NameNodeRpcServer.java:<init>(329)) - RPC server is binding to master1.local:8020 2014-12-16 14:53:29,348 INFO ipc.CallQueueManager (CallQueueManager.java:<init>(53)) - Using callQueue class java.util.concurrent.LinkedBlockingQueue 2014-12-16 14:53:29,390 INFO ipc.Server (Server.java:run(827)) - IPC Server Responder: starting 2014-12-16 14:53:29,390 INFO ipc.Server (Server.java:run(674)) - IPC Server listener on 8020: starting 2014-12-16 14:53:29,393 INFO namenode.NameNode (NameNode.java:startCommonServices(646)) - NameNode RPC up at: master1.local/10.0.0.0:8020 2014-12-16 14:53:29,393 INFO namenode.FSNamesystem (FSNamesystem.java:startActiveServices(1142)) - Starting services required for active state 2014-12-16 14:53:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(160)) - Starting CacheReplicationMonitor with interva 2014-12-16 14:53:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 13919829439 milliseconds 2014-12-16 14:53:29,576 INFO fs.TrashPolicyDefault (TrashPolicyDefault.java:initialize(92)) - Namenode trash configuration: Deletion interval = 360 minutes, Empt 2014-12-16 14:53:29,576 INFO fs.TrashPolicyDefault (TrashPolicyDefault.java:<init>(247)) - The configured checkpoint interval is 0 minutes. Using an interval of 3 deletion instead 2014-12-16 14:53:29,584 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 189 m 2014-12-16 14:53:59,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds 2014-12-16 14:53:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 mill 2014-12-16 14:54:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds 2014-12-16 14:54:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 2 mil 2014-12-16 14:54:59,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds 2014-12-16 14:54:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 mill 2014-12-16 14:55:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds 2014-12-16 14:55:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 mill 2014-12-16 14:55:59,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30001 milliseconds 2014-12-16 14:55:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 mill 2014-12-16 14:56:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds 2014-12-16 14:56:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 mill 2014-12-16 14:56:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds 2014-12-16 14:56:59,398 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 mill 2014-12-16 14:57:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds 2014-12-16 14:57:29,398 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 2 mil 2014-12-16 14:57:59,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds 2014-12-16 14:57:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 mill   (ever lasting...) https://gist.github.com/tagomoris/ed7aa8ccb3d6003a29f9
  • 19. ever lasting!!!!!!!! ${dfs.namenode.name.dir}/current and .../previous are not modified anymore in 60 minutes...
  • 20. rollback stop all daemons replace all packages w/ HDP2.1 replace configurations for HDP2.1 /usr/lib/hadoop/sbin/hadoop-daemon.sh --config ... start namenode -rollback $ /usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start namenode -rollback starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-4c3bf0834.livedoor.out "rollBack" will remove the current state of the file system, returning you to the state prior to initiating your recent. upgrade. This action is permanent and cannot be undone. If you are performing a rollback in an HA environment, you should be certain that no NameNode process is running on any host.Roll back file system state? (Y or N) Invalid input: Roll back file system state? (Y or N) Invalid input: Roll back file system state? (Y or N) Invalid input: Roll back file system state? (Y or N) Invalid input: Roll back file system state? (Y or N) Invalid input: Roll back file system state? (Y or N) Invalid input: $
  • 21. impossible I cannot input any “Y”s...
  • 22. Recovery replace namenode metadata w/ backup execute NameNode (HDP 2.1) & DataNode cluster recovered!!!!
  • 23. Recovery replace namenode metadata w/ backup execute NameNode (HDP 2.1) & DataNode cluster recovered!!!! Replication numbers of all blocks are ZERO!!!!!!!1!!!!1!
  • 24. Recovery replace namenode metadata w/ backup execute NameNode (HDP 2.1) & DataNode cluster recovered!!!! replication numbers of all blocks are ZERO!!!!!!!1!!!!1! hadoop fsck / -> all blocks become fine!
  • 25. Conclusion I will wait for anyone who uses HDP 2.2...