Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Multi-Cluster Live Synchronization
with Kerberos Federated Hadoop
張雅芳 Mammi Chang
@ 2015 Taiwan HadoopCon
Who am I ?
• Mammi Chang 張雅芳
• Sr. Engineer, SPN, Trend Micro
• SPN Hadoop Cluster Administrator
• DevOps on Hadoop ecosys...
Original
Data
Center
New
Data
Center
TMH7TMH6
service
Data SyncData Sync
This is a story of move …
Production
TMH6 TMH7
TMH6 TMH7
Original
Data
Center
New
Data
Center
Production
Staging Staging
Data Sync
Data Sync
Data Sy...
Data
SynchronizationData synchronization is the process of establishing consistency among data from a
source to a target d...
One-way file synchronization
 Updated files copied from source to destination
Two-way file synchronization
 Updated file...
Linux One-Way File Synchronization
$ cp fileA fileB
$ scp ./directory/my_file
mammi@198.167.0.3:/home/mammi/
$ rsync -avP ...
Hadoop One-Way File Synchronization
$ hadoop fs -cp /user/mammi/file1 /user/mammi/dir/
$ hadoop distcp hdfs://cluster1/fil...
#TrendInsight
Hadoop Data Synchronization
DistCp with the same
Hadoop version is trivial.
Hadoop - 2.6 Cluster1 Hadoop – 2.6 Cluster2
$ hadoop distcp hdfs://cluster1_nn:8020/test
hdfs://cluster2_nn:8020/test
Hadoop - 2.6 Cluster1 Hadoop – 2.6 Cluster2
$ hadoop distcp hdfs://cluster2_nn:8020/test
hdfs://cluster1_nn:8020/test
DistCp with the same
Hadoop version is trivial.
different
a little bit tricky
Oops …
[root@tw-spnhadoop1 hadooppet]# hadoop distcp hdfs://cluster1/test hdfs://krb-1.spn.lab.trendnet.org:8020/test
15/0...
Apache Hadoop – 2.0
Cluster1
Apache Hadoop – 2.6
Cluster2
$ hadoop distcp hftp://cluster1_nn:50070/test
hdfs://cluster2_nn...
TMH6 Cluster1 TMH7 Cluster2
$ hadoop distcp ????://TMH6_NN:????/test
hdfs://TMH7_NN:8020/test
CDH Based Apache Based
TMH6 Cluster1 TMH7 Cluster2
$ hadoop distcp hftp://TMH6_NN:50070/test
hdfs://TMH7_NN:8020/test
CDH Based Apache Based
Only...
DistCp with different Hadoop
version is a little bit tricky
plus kerberos security
annoying !!
TMH6 Cluster1 TMH7 Cluster2
$ hadoop distcp ????://TMH6_NN:XXXX/test
????://TMH7_NN:XXXX/test
DistCp Data Copy Matrix:
HDP1/HDP2 to HDP2
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.1/bk_system-admin-
guide/...
DistCp Data Copy Matrix:
TMH6/TMH7 to TMH6/TMH7
TMH6
TMH7
insecure
secure
hdfs
hftp
webhdfs
2
TMH6 Cluster1 TMH7 Cluster2
$ hadoop distcp webhdfs://TMH6_NN:8020/test
webhdfs://TMH7_NN:8020/test
Hadoop Security with Kerberos
Kerberos is a computer
network authentication protocol which
works on the basis of 'tickets'...
REALM – CLUSTER.DOMAIN.COM
Kerberos Negotiation
KDC
(Key Distributed Center)
TGT
(Ticket-Granting Ticket)
KDC
Client
Hadoo...
REALM – CLUSTER2.DOMAIN.COM
Kerberos Cross-Realm authenticate
REALM – CLUSTER1.DOMAIN.COM
KDC
Client
Hadoop
Servers
Msg3 :...
Kerberos Federation for Hadoop
Kerberos Setting
• Set different REALM in
each cluster’s KDC
• Add both cluster’s kerberos
...
Multi-Cluster Kerberos Federation
Cluster1
•Set different REALM
in each cluster’s KDC
•Add all other cluster’s
kerberos in...
DistCp with different Hadoop version
plus kerberos federation
is annoying !!
in cross DC multi-cluster
not easy.
Done!!
DistCp with different Hadoop
version plus kerberos federation in
cross DC mult-clusters is not easy
at all.
Production
TMH6 TMH7
TMH6 TMH7
Original
Data
Center
New
Data
Center
Production
Staging Staging
Two-way keberos
federation ...
#TrendInsight
More Than Functionality …
Issues
• Computing resource
• Zero-downtime
• Schedule limitation
• Network bandwidth
Computing Resource
• Principle
– Do not have production service impact when
many DistCp jobs running
• Strategy
– Run dist...
Production
TMH6 TMH7
TMH6 TMH7
Original
Data
Center
New
Data
Center
Production
Staging Staging
Two-way keberos
federation ...
Production
TMH6 TMH7
TMH6 TMH7
Original
Data
Center
New
Data
Center
Production
Staging Staging
Two-way keberos
federation ...
Zero-downtime
• Principle
– Do not have Production Env. downtime
• Strategy
– Change KDC REALM in Staging only
– Rolling r...
Schedule Limitation
• Principle
– Provide minimum dataset that fulfill production
services requirement
• Strategy
– Divide...
#TrendInsight
Lesson Learn
Automation is vital !!!
• Automated CI tests on such complex and
repeated tasks
– save your life time
– prevent plenty of ...
Customization is
necessary
• Home made distcp running script with
error handling
• Setting permission by real case
Just try it
• Survey is important but sometimes it
cannot totally solve your problem
#TrendInsight
Thank you
QUESTION?
#TrendInsight
Backups
Kerberos Cross-Realm Federation
• Set different REALM in each cluster’s KDC
• Add both cluster’s kerberos information to c...
Set different REAML in each cluster’s KDC
Cluster1 krb5.conf
[realms]
CLUSTER1.DOMAIN.COM = {
kdc = cluster1_kdc_master:88...
Add both cluster’s kerberos information to krb5.conf
Both Cluster1 and Cluster2 krb5.conf
[realms]
CLUSTER1.DOMAIN.COM = {...
Add federated kerberos principal to both KDC DB
$ kadmin.local: addprinc -e "rc4-hmac:normal des3-hmac-sha1:normal" krbtgt...
Add Hadoop Configuration
core-site.xml
<property>
<name>hadoop.security.auth_to_local</name>
<value>
RULE:[1:$1@$0](^.*@CL...
Make sure both cluster nodes can recognize each other
• /etc/hosts for both cluster1 and cluster2 nodes
10.1.145.1 machine...
Restart necessary services
• KDC server
– service krb5kdc restart
– service kadmin restart
• Namenodes, Datanodes
– servic...
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
Upcoming SlideShare
Loading in …5
×

HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop

961 views

Published on

In enterprise on-premises data center, we may have multiple Secured Hadoop clusters for different purpose. Sometimes, these Hadoop clusters might have different Hadoop distribution, Hadoop version, or even locat in different Data Center. To fulfill business requirement, data synchronize between these clusters could be an important mechanism. However, the story will be more complicated within the real world secured multi-cluster, compare to distcp between two same version and non-secured Hadoop clusters.

We would like to go through our experience on enable live data synchronization for mutiple kerberos enabled Hadoop clusters. Which include the functionality verification, multi-cluster configurations and automation setup process, etc. After that, we would share the use cases among those kerberos federated Hadoop clusters. Finally, provide our common practice on multi-cluster data synchronization.

Published in: Technology

HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop

  1. 1. Multi-Cluster Live Synchronization with Kerberos Federated Hadoop 張雅芳 Mammi Chang @ 2015 Taiwan HadoopCon
  2. 2. Who am I ? • Mammi Chang 張雅芳 • Sr. Engineer, SPN, Trend Micro • SPN Hadoop Cluster Administrator • DevOps on Hadoop ecosystem and AWS • 2014 HadoopCon Speaker
  3. 3. Original Data Center New Data Center TMH7TMH6 service Data SyncData Sync This is a story of move …
  4. 4. Production TMH6 TMH7 TMH6 TMH7 Original Data Center New Data Center Production Staging Staging Data Sync Data Sync Data Sync
  5. 5. Data SynchronizationData synchronization is the process of establishing consistency among data from a source to a target data storage and vice versa and the continuous harmonization of the data over time. - From wikipedia “Data synchronization”
  6. 6. One-way file synchronization  Updated files copied from source to destination Two-way file synchronization  Updated files are copied in both directories  Dropbox, SafeSync, etc
  7. 7. Linux One-Way File Synchronization $ cp fileA fileB $ scp ./directory/my_file mammi@198.167.0.3:/home/mammi/ $ rsync -avP /source/data /destination/
  8. 8. Hadoop One-Way File Synchronization $ hadoop fs -cp /user/mammi/file1 /user/mammi/dir/ $ hadoop distcp hdfs://cluster1/file hdfs://cluster2/file
  9. 9. #TrendInsight Hadoop Data Synchronization
  10. 10. DistCp with the same Hadoop version is trivial.
  11. 11. Hadoop - 2.6 Cluster1 Hadoop – 2.6 Cluster2 $ hadoop distcp hdfs://cluster1_nn:8020/test hdfs://cluster2_nn:8020/test
  12. 12. Hadoop - 2.6 Cluster1 Hadoop – 2.6 Cluster2 $ hadoop distcp hdfs://cluster2_nn:8020/test hdfs://cluster1_nn:8020/test
  13. 13. DistCp with the same Hadoop version is trivial. different a little bit tricky
  14. 14. Oops … [root@tw-spnhadoop1 hadooppet]# hadoop distcp hdfs://cluster1/test hdfs://krb-1.spn.lab.trendnet.org:8020/test 15/01/22 15:11:44 INFO tools.DistCp: srcPaths=[hdfs://cluster1/test] 15/01/22 15:11:44 INFO tools.DistCp: destPath=hdfs://krb-1.spn.lab.trendnet.org:8020/test 15/01/22 15:11:45 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 381 for hdfs on ha-hdfs:cluster1 15/01/22 15:11:45 INFO security.TokenCache: Got dt for hdfs://cluster1; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cluster1, Ident: (HDFS_DELEGATION_TOKEN token 381 for hdfs) 15/01/22 15:11:46 ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs/tw- spnhadoop1.spn.tw.trendnet.org@ISPN.TRENDMICRO.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(null): org.apache.hadoop.ipc.RPC$VersionMismatch 15/01/22 15:11:46 INFO security.UserGroupInformation: Initiating logout for hdfs/tw-spnhadoop1.spn.tw.trendnet.org@ISPN.TRENDMICRO.COM 15/01/22 15:11:46 INFO security.UserGroupInformation: Initiating re-login for hdfs/tw-spnhadoop1.spn.tw.trendnet.org@ISPN.TRENDMICRO.COM 15/01/22 15:11:50 ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs/tw- spnhadoop1.spn.tw.trendnet.org@ISPN.TRENDMICRO.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(null): org.apache.hadoop.ipc.RPC$VersionMismatch 15/01/22 15:11:50 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before. 15/01/22 15:11:53 ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs/tw- spnhadoop1.spn.tw.trendnet.org@ISPN.TRENDMICRO.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(null): org.apache.hadoop.ipc.RPC$VersionMismatch
  15. 15. Apache Hadoop – 2.0 Cluster1 Apache Hadoop – 2.6 Cluster2 $ hadoop distcp hftp://cluster1_nn:50070/test hdfs://cluster2_nn:8020/test HftpFileSystem is a read-only FileSystem, so DistCp must be run on the destination cluster
  16. 16. TMH6 Cluster1 TMH7 Cluster2 $ hadoop distcp ????://TMH6_NN:????/test hdfs://TMH7_NN:8020/test CDH Based Apache Based
  17. 17. TMH6 Cluster1 TMH7 Cluster2 $ hadoop distcp hftp://TMH6_NN:50070/test hdfs://TMH7_NN:8020/test CDH Based Apache Based Only support data sync from TMH6 to TMH7
  18. 18. DistCp with different Hadoop version is a little bit tricky plus kerberos security annoying !!
  19. 19. TMH6 Cluster1 TMH7 Cluster2 $ hadoop distcp ????://TMH6_NN:XXXX/test ????://TMH7_NN:XXXX/test
  20. 20. DistCp Data Copy Matrix: HDP1/HDP2 to HDP2 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.1/bk_system-admin- guide/content/distcp-table.html Webhdfs is a HTTP REST API supports the complete FileSystem interface for HDFS
  21. 21. DistCp Data Copy Matrix: TMH6/TMH7 to TMH6/TMH7 TMH6 TMH7 insecure secure hdfs hftp webhdfs 2
  22. 22. TMH6 Cluster1 TMH7 Cluster2 $ hadoop distcp webhdfs://TMH6_NN:8020/test webhdfs://TMH7_NN:8020/test
  23. 23. Hadoop Security with Kerberos Kerberos is a computer network authentication protocol which works on the basis of 'tickets' to allow nodes communicating over a non- secure network to prove their identity to one another in a secure manner - From wikipedia “Kerberos_(Protocol)”
  24. 24. REALM – CLUSTER.DOMAIN.COM Kerberos Negotiation KDC (Key Distributed Center) TGT (Ticket-Granting Ticket) KDC Client Hadoop Servers Msg3 : Authenticator, TGT Msg4 : client/server ticket Msg1 : client login KDC Msg2 : client TGT Msg5 : Authenticator, ticket Msg6 : time auth
  25. 25. REALM – CLUSTER2.DOMAIN.COM Kerberos Cross-Realm authenticate REALM – CLUSTER1.DOMAIN.COM KDC Client Hadoop Servers Msg3 : Authenticator, TGT Msg4 : client/server ticket Msg1 : client login KDC Msg2 : client TGT Msg5 : Authenticator, ticket Msg6 : time auth KDC
  26. 26. Kerberos Federation for Hadoop Kerberos Setting • Set different REALM in each cluster’s KDC • Add both cluster’s kerberos information to configs • Add federated kerberos principal to both KDC DB • Restart kerberos services Hadoop Setting • Add Hadoop configurations • Make sure both cluster nodes can recognize each other • Restart necessary Hadoop services
  27. 27. Multi-Cluster Kerberos Federation Cluster1 •Set different REALM in each cluster’s KDC •Add all other cluster’s kerberos information to configuration •Add all federated kerberos principal to KDC DB •Add Hadoop configurations •Make sure all cluster nodes can recognize each others •Restart necessary services Cluster2 •Set different REALM in each cluster’s KDC •Add all other cluster’s kerberos information to configuration •Add all federated kerberos principal to KDC DB •Add Hadoop configurations •Make sure all cluster nodes can recognize each others •Restart necessary services … •… Cluster N •Set different REALM in each cluster’s KDC •Add all other cluster’s kerberos information to configuration •Add all federated kerberos principal to KDC DB •Add Hadoop configurations •Make sure all cluster nodes can recognize each others •Restart necessary services
  28. 28. DistCp with different Hadoop version plus kerberos federation is annoying !! in cross DC multi-cluster not easy. Done!!
  29. 29. DistCp with different Hadoop version plus kerberos federation in cross DC mult-clusters is not easy at all.
  30. 30. Production TMH6 TMH7 TMH6 TMH7 Original Data Center New Data Center Production Staging Staging Two-way keberos federation link Data Sync Data Sync Data Sync
  31. 31. #TrendInsight More Than Functionality …
  32. 32. Issues • Computing resource • Zero-downtime • Schedule limitation • Network bandwidth
  33. 33. Computing Resource • Principle – Do not have production service impact when many DistCp jobs running • Strategy – Run distcp on Staging Env. Instand of Production Env.
  34. 34. Production TMH6 TMH7 TMH6 TMH7 Original Data Center New Data Center Production Staging Staging Two-way keberos federation link $ hadoop distcp webhdfs://TMH6_PROD_NN:8020/test webhdfs://TMH7_PROD_NN:8020/test Data Sync Data flow
  35. 35. Production TMH6 TMH7 TMH6 TMH7 Original Data Center New Data Center Production Staging Staging Two-way keberos federation link $ hadoop distcp webhdfs://TMH6_PROD_NN:8020/test webhdfs://TMH7_PROD_NN:8020/test Data Sync Data flow
  36. 36. Zero-downtime • Principle – Do not have Production Env. downtime • Strategy – Change KDC REALM in Staging only – Rolling restart services
  37. 37. Schedule Limitation • Principle – Provide minimum dataset that fulfill production services requirement • Strategy – Divide dataset into cold data and hot data – All necessary hot data need to be ready before service move to new DC
  38. 38. #TrendInsight Lesson Learn
  39. 39. Automation is vital !!! • Automated CI tests on such complex and repeated tasks – save your life time – prevent plenty of human errors
  40. 40. Customization is necessary • Home made distcp running script with error handling • Setting permission by real case
  41. 41. Just try it • Survey is important but sometimes it cannot totally solve your problem
  42. 42. #TrendInsight Thank you QUESTION?
  43. 43. #TrendInsight Backups
  44. 44. Kerberos Cross-Realm Federation • Set different REALM in each cluster’s KDC • Add both cluster’s kerberos information to configs • Add federated kerberos principal to both KDC DB • Add Hadoop configurations • Make sure both cluster nodes can recognize each other • Restart necessary services
  45. 45. Set different REAML in each cluster’s KDC Cluster1 krb5.conf [realms] CLUSTER1.DOMAIN.COM = { kdc = cluster1_kdc_master:88 kdc = cluster1_kdc_slave:88 admin_server = cluster1_kdc_master:749 } [domain_realm] cluster1.domain.com = CLUSTER1. DOMAIN.COM .cluster1.domain.com = CLUSTER1. DOMAIN.COM Cluster2 krb5.conf [realms] CLUSTER2.DOMAIN.COM = { kdc = cluster2_kdc_master:88 kdc = cluster2_kdc_slave:88 admin_server = cluster2_kdc_master:749 } [domain_realm] cluster2.domain.com = CLUSTER2. DOMAIN.COM .cluster2.domain.com = CLUSTER2. DOMAIN.COM
  46. 46. Add both cluster’s kerberos information to krb5.conf Both Cluster1 and Cluster2 krb5.conf [realms] CLUSTER1.DOMAIN.COM = { kdc = cluster1_kdc_master:88 kdc = cluster1_kdc_slave:88 admin_server = cluster1_kdc_master:749 } CLUSTER2.DOMAIN.COM = { kdc = cluster2_kdc_master:88 kdc = cluster2_kdc_slave:88 admin_server = cluster2_kdc_master:749 } [domain_realm] cluster1.domain.com = CLUSTER1. DOMAIN.COM .cluster1.domain.com = CLUSTER1. DOMAIN.COM cluster2.domain.com = CLUSTER2. DOMAIN.COM .cluster2.domain.com = CLUSTER2. DOMAIN.COM
  47. 47. Add federated kerberos principal to both KDC DB $ kadmin.local: addprinc -e "rc4-hmac:normal des3-hmac-sha1:normal" krbtgt/ CLUSTER1.DOMAIN.COM@CLUSTER2.DOMAIN.COM WARNING: no policy specified for krbtgt/CLUSTER1.DOMAIN.COM@ CLUSTER2.DOMAIN.COM; defaulting to no policy Enter password for principal "krbtgt/ CLUSTER1.DOMAIN.COM@CLUSTER2.DOMAIN.COM ": // 123456 Re-enter password for principal "krbtgt/CLUSTER1.DOMAIN.COM@CLUSTER2.DOMAIN.COM": // 123456 Principal "krbtgt/CLUSTER1.DOMAIN.COM@CLUSTER2.DOMAIN.COM" created. $ kadmin.local: addprinc -e "rc4-hmac:normal des3-hmac-sha1:normal" krbtgt/CLUSTER2.DOMAIN.COM@CLUSTER1.DOMAIN.COM WARNING: no policy specified for krbtgt/CLUSTER2.DOMAIN.COM@CLUSTER1.DOMAIN.COM; defaulting to no policy Enter password for principal "krbtgt/CLUSTER2.DOMAIN.COM @CLUSTER1.DOMAIN.COM ": // 654321 Re-enter password for principal "krbtgt/CLUSTER2.DOMAIN.COM@CLUSTER1.DOMAIN.COM ": // 654321 Principal "krbtgt/CLUSTER2.DOMAIN.COM@CLUSTER1.DOMAIN.COM " created. use the same password for a principal to make sure the encryption key is the same
  48. 48. Add Hadoop Configuration core-site.xml <property> <name>hadoop.security.auth_to_local</name> <value> RULE:[1:$1@$0](^.*@CLUSTER.DOMAIN. COM$)s/^(.*)@CLUSTER.DOMAIN.COM$/$1/g RULE:[2:$1@$0](^.*@CLUSTER.DOMAIN. COM$)s/^(.*)@CLUSTER.DOMAIN.COM$/$1/g DEFAULT </value> </property> hdfs-site.xml <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> Verify the setting of rule hadoop org.apache.hadoop.security.HadoopKerberosName mapred/machine.cluster.domain.com@CLUSTER.DOMAIN.COM Name: mapred/machine.cluster.domain.com@CLUSTER.DOMAIN.COM to mapred
  49. 49. Make sure both cluster nodes can recognize each other • /etc/hosts for both cluster1 and cluster2 nodes 10.1.145.1 machine1.cluster1.domain.com 10.1.145.2 machine2.cluster1.domain.com 10.1.145.3 machine3.cluster1.domain.com 10.1.144.1 machine1.cluster2.domain.com 10.1.144.2 machine2.cluster2.domain.com 10.1.144.3 machine3.cluster2.domain.com
  50. 50. Restart necessary services • KDC server – service krb5kdc restart – service kadmin restart • Namenodes, Datanodes – service hadoop-hdfs-namenode restart – servcie hadoop-hdfs-datanode restart

×