In enterprise on-premises data center, we may have multiple Secured Hadoop clusters for different purpose. Sometimes, these Hadoop clusters might have different Hadoop distribution, Hadoop version, or even locat in different Data Center. To fulfill business requirement, data synchronize between these clusters could be an important mechanism. However, the story will be more complicated within the real world secured multi-cluster, compare to distcp between two same version and non-secured Hadoop clusters.
We would like to go through our experience on enable live data synchronization for mutiple kerberos enabled Hadoop clusters. Which include the functionality verification, multi-cluster configurations and automation setup process, etc. After that, we would share the use cases among those kerberos federated Hadoop clusters. Finally, provide our common practice on multi-cluster data synchronization.