This talk describes top ten things that make it easier to run and manage your Hadoop system in production. We start with configurations, best practices in planning and setting up Hadoop clusters for reliability and efficiency. We include typical machine sizing and the tradeoffs of big vs small servers relative to cluster size. We cover how to implement a cluster for multi-tenancy with an eye on isolation and sharing cluster resources. Next we describe the tools available for managing the cluster, such as decommissioning, balancer, and metrics. We include best practices for monitoring a cluster and dealing with different kinds of failures. In particular we emphasise differences from traditional data center server management especially when dealing with failures of disks and nodes. We go over how to use the tools available for backup, Disaster Recovery and Archiving. We concluded with how to cope with storage and computation growth that Hadoop production clusters typically see. These lessons and tips have been derived from our extensive experience in running production Hadoop clusters and supporting customers over the last six years. We share anecdotes and real life incidents throughout the talk.
Clipping is a handy way to collect important slides you want to go back to later.