SlideShare a Scribd company logo
1 of 38
Hadoop Operations –
Best Practices from the Field
June 11, 2015
Chris Nauroth
email: cnauroth@hortonworks.com
twitter: @cnauroth
Suresh Srinivas
email: suresh@hortonworks.com
twitter: @suresh_m_s
© Hortonworks Inc. 2011
About Me
Chris Nauroth
• Member of Technical Staff, Hortonworks
– Apache Hadoop committer, PMC member, and Apache Software Foundation member
– Major contributor to HDFS ACLs, Windows compatibility, and operability improvements
• Hadoop user since 2010
– Prior employment experience deploying, maintaining and using Hadoop clusters
Page 2
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Agenda
• Analysis of Hadoop Support Cases
– Support case trends
– Configuration
– Software Improvements
• Key Learnings and Best Practices
– HDFS ACLs
– HDFS Snapshots
– Reporting DataNode Volume Failures
Page 3
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Support Case Trends – Proportional Cases per Month
Page 4
Architecting the Future of Big Data
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
HDFS
Map Reduce
YARN
Other (37 components)
© Hortonworks Inc. 2011
Support Case Trends – Root Cause
Page 5
Architecting the Future of Big Data
0
200
400
600
800
1000
1200
Customer Environment
(Non HDP)
Documentation Defect Documentation Gap Documentation Not
Utilized
Education -
Configuration
Needs Training Product Defect
YARN
Map Reduce
HDFS
© Hortonworks Inc. 2011
Support Case Trends
• Core Hadoop components (HDFS, YARN and MapReduce) are used across all deployments, and
therefore receive proportionally more support cases than other ecosystem components.
• Misconfiguration is the dominant root cause.
• Documentation is a close second.
• We are constantly improving the code to eliminate operational issues, help with diagnosis and
provide increased visibility.
• Best practices get incorporated into Apache Ambari for improved defaults, simplified
configuration and deeper monitoring.
Page 6
Architecting the Future of Big Data
Configuration
© Hortonworks Inc. 2011
Configuration - Hardware and Cluster Sizing
• Considerations
–Larger clusters heal faster on nodes or disk failure
–Machines with huge storage take longer to recover
–More racks give more failure domains
• Recommendations
– Get good-quality commodity hardware
– Buy the sweet-spot in pricing: 3TB disk, 96GB, 8-12 cores
– More memory is better – real time is memory hungry!
– Before considering fatter machines (1U 6 disks vs. 2U 12 disks)
– Get to 30-40 machines or 3-4 racks
–Use pilot cluster to learn about load patterns
– Balanced hardware for I/O, compute or memory bound
– More details - http://tinyurl.com/hwx-hadoop-hw
Page 8
© Hortonworks Inc. 2011
Configuration – JVM Tuning
• Avoid JVM issues
– Use 64 bit JVM for all daemons
– Compressed OOPS enabled by default (6 u23 and later)
– Java heap size
– Set same max and starting heapsize, Xmx == Xms
– Avoid java defaults – configure NewSize and MaxNewSize
– Use 1/8 to 1/6 of max size for JVMs larger than 4G
– Configure –XX:PermSize=128 MB, -XX:MaxPermSize=256 MB
– Use low-latency GC collector
– -XX:+UseConcMarkSweepGC, -XX:ParallelGCThreads=<N>
– High <N> on Namenode and JobTracker or ResourceManager
– Important JVM configs to help debugging
– -verbose:gc -Xloggc:<file> -XX:+PrintGCDetails
– -XX:ErrorFile=<file>
– -XX:+HeapDumpOnOutOfMemoryError
Page 9
© Hortonworks Inc. 2011
Configuration
• Deploy with QuorumJournalManager for high availability
• Configure open fd ulimit
– Default 1024 is too low
– 16K for datanodes, 64K for Master nodes
• Use version control for configuration!
Page 10
© Hortonworks Inc. 2011
Configuration
• Use disk fail in place for datanodes: dfs.datanode.failed.volumes.tolerated
– Disk failure is no longer datanode failure
– Especially important for large density nodes
• Set dfs.namenode.name.dir.restore to true
– Restores NN storage directory during checkpointing
• Take periodic backups of namenode metadata
– Make copies of the entire storage directory
• Set aside a lot of disk space for NN logs
– It is verbose – set aside multiple GBs
– Many installs configure this too small
– NN logs roll with in minutes – hard to debug issues
Page 11
© Hortonworks Inc. 2011
Configuration – Monitoring Usage
• Cluster storage, nodes, files, blocks grows
– Update NN heap, handler count, number of DN xceivers
– Tweak other related config periodically
• Monitor the hardware usage for your work load
– Disk I/O, network I/O, CPU and memory usage
– Use this information when expanding cluster capacity
• Monitor the usage with HADOOP metrics
– JVM metrics – GC times, Memory used, Thread Status
– RPC metrics – especially latency to track slowdowns
– HDFS metrics
– Used storage, # of files and blocks, total load on the cluster
– File System operations
– MapReduce Metrics
– Slot utilization and Job status
• Tweak configurations during upgrades/maintenance on an ongoing basis
Page 12
HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Install & Configure: Ambari Guided Configuration
Guide configuration and provide
recommendations for the most
common settings.
(HBase Example Shown here)
Software Improvements
Real Incidents and Software Improvements to Address Them
© Hortonworks Inc. 2011
Don’t edit the metadata files!
• Editing can corrupt the cluster state
– Might result in loss of data
• Real incident
– NN misconfigured to point to another NN’s metadata
– DNs can’t register due to namespace ID mismatch
– System detected the problem correctly
– Safety net ignored by the admin!
– Admin edits the namenode VERSION file to match ids
Mass deletion of unknown blocks that do not
exist in that namespace
Page 15
© Hortonworks Inc. 2011
Improvement
• Pause deletion of blocks when the namenode starts up
– https://issues.apache.org/jira/browse/HDFS-6186
– Supports configurable delay of block deletions after NameNode startup
– Gives an admin extra time to diagnose before deletions begin
• Show when block deletion will start after NameNode startup in WebUI
– https://issues.apache.org/jira/browse/HDFS-6385
– The web UI already displayed the number of pending block deletions
– This enhanced the display to indicate when actual deletion will begin
Page 16
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Block Deletion Start Time
Page 17
Architecting the Future of Big Data
New
© Hortonworks Inc. 2011
Guard Against Accidental Deletion
• rm –r deletes the data at the speed of Hadoop!
– ctrl-c of the command does not stop deletion!
– Undeleting files on datanodes is hard & time consuming
– Immediately shutdown NN, unmount disks on datanodes
– Recover deleted files
– Start namenode without the delete operation in edits
• Enable Trash
• Real Incident
– Customer is running a distro of Hadoop with trash not enabled
– Deletes a large dir (100 TB) and shuts down NN immediately
– Support person asks NN to be restarted to see if trash is enabled!
Blocks start deleting
Page 18
© Hortonworks Inc. 2011
Improvement
• HDFS Snapshots
– https://issues.apache.org/jira/browse/HDFS-2802
– A snapshot is a read-only point-in-time image of part of the file system
– A snapshot created before a deletion can be used to restore deleted data
– More coverage of snapshots later in the presentation
• HDFS ACLs
– https://issues.apache.org/jira/browse/HDFS-4685
– Finer-grained control of file permissions can help prevent an accidental deletion
– More coverage of ACLs later in the presentation
Page 19
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Unexpected error during HA HDFS upgrade
• Background: HDFS HA Architecture
– http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
• Real Incident
– During upgrade, NameNode calls every JournalNode to request backup of metadata directory, which renames
“current” directory to “previous.tmp”.
– Permissions incorrect on metadata directory for 1 out of 3 JournalNodes.
– The hdfs user is not authorized to rename. Backup fails for that JournalNode, so upgrade process aborts with
error.
Root cause not easily identifiable, long time to
recover
Page 20
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Improvement
• Improve diagnostics on storage directory rename operations by using native code.
– https://issues.apache.org/jira/browse/HDFS-7118
– Logs additional root cause information for rename failure. For example, EACCES
• Split error checks in into separate conditions to improve diagnostics.
– https://issues.apache.org/jira/browse/HDFS-7119
– Splits a log message about failure to delete or rename into separate log messages to clarify which specific action
failed
• When aborting NameNode or JournalNode, write the contents of the metadata directories and
permissions to logs.
– https://issues.apache.org/jira/browse/HDFS-7120
– Usually the first information asked of the user, so we can automate this
• For JournalNode operations that must succeed on all nodes, execute a pre-check to verify that
the operation can succeed.
– https://issues.apache.org/jira/browse/HDFS-7121
– Prevents need for manual cleanup on 2 out of 3 JournalNodes where backup succeeded
Page 21
Architecting the Future of Big Data
Key Learnings and Best Practices
Features that Help Improve Production Operations
© Hortonworks Inc. 2011
HDFS ACLs
• Existing HDFS POSIX permissions good, but not flexible enough
– Permission requirements may differ from the natural organizational hierarchy of users and groups.
• HDFS ACLs augment the existing HDFS POSIX permissions model by implementing the POSIX
ACL model.
– An ACL (Access Control List) provides a way to set different permissions for specific named users or named
groups, not only the file’s owner and file’s group.
Page 23
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS File Permissions Example
• Authorization requirements:
–In a sales department, they would like a single user Maya (Department Manager) to
control all modifications to sales data
–Other members of sales department need to view the data, but can’t modify it.
–Everyone else in the company must not be allowed to view the data.
• Can be implemented via the following:
Read/Write perm for user
maya
User
Group
Read perm for group sales
File with sales data
© Hortonworks Inc. 2011
HDFS ACLs
• Problem
–No longer feasible for Maya to control all modifications to the file
– New Requirement: Maya, Diane and Clark are allowed to make modifications
– New Requirement: New group called executives should be able to read the sales data
–Current permissions model only allows permissions at 1 group and 1 user
• Solution: HDFS ACLs
–Now assign different permissions to different users and groups
Owner
Group
Others
HDFS
Directory
… rwx
… rwx
… rwx
Group D … rwx
Group F … rwx
User Y … rwx
© Hortonworks Inc. 2011
HDFS ACLs
New Tools for ACL Management (setfacl, getfacl)
– hdfs dfs -setfacl -m group:execs:r-- /sales-data
– hdfs dfs -getfacl /sales-data # file: /sales-data # owner: maya # group:
sales user::rw- group::r-- group:execs:r-- mask::r-- other::--
– How do you know if a directory has ACLs set?
– hdfs dfs -ls /sales-data Found 1 items -rw-r-----+ 3 maya sales 0
2014-03-04 16:31 /sales-data
© Hortonworks Inc. 2011
HDFS ACLs Best Practices
• Start with traditional HDFS permissions to implement most permission requirements.
• Define a smaller number of ACLs to handle exceptional cases.
• A file with an ACL incurs an additional cost in memory in the NameNode compared to a file that
has only traditional permissions.
Page 27
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS Snapshots
• HDFS Snapshots
– A snapshot is a read-only point-in-time image of part of the file system
– Performance: snapshot creation is instantaneous, regardless of data size or subtree depth
– Reliability: snapshot creation is atomic
– Scalability: snapshots do not create extra copies of data blocks
– Useful for protecting against accidental deletion of data
• Example: Daily Feeds
hdfs dfs -ls /daily-feeds
Found 5 items
drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-13
drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-14
drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-15
drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-16
drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-17
Page 28
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS Snapshots
• Create a snapshot after each daily load
hdfs dfsadmin -allowSnapshot /daily-feeds
Allowing snaphot on /daily-feeds succeeded
hdfs dfs -createSnapshot /daily-feeds snapshot-to-2014-10-17
Created snapshot /daily-feeds/.snapshot/snapshot-to-2014-10-17
• User accidentally deletes data for 2014-10-16
hdfs dfs -ls /daily-feeds
Found 4 items
drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-13
drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-14
drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-15
drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-17
Page 29
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS Snapshots
• Snapshots to the rescue: the data is still in the snapshot
hdfs dfs -ls /daily-feeds/.snapshot/snapshot-to-2014-10-17
Found 5 items
drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-
feeds/.snapshot/snapshot-to-2014-10-17/2014-10-13
drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-
feeds/.snapshot/snapshot-to-2014-10-17/2014-10-14
drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-
feeds/.snapshot/snapshot-to-2014-10-17/2014-10-15
drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-
feeds/.snapshot/snapshot-to-2014-10-17/2014-10-16
drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-
feeds/.snapshot/snapshot-to-2014-10-17/2014-10-17
• Restore data from 2014-10-16
hdfs dfs -cp /daily-feeds/.snapshot/snapshot-to-2014-10-17/2014-10-16 /daily-feeds
Page 30
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Reporting DataNode Volume Failures
• Configuring dfs.datanode.failed.volumes.tolerated > 0 enables a DataNode to keep running after
volume failures
• DataNode is still running, but capacity is degraded
• HDFS already provided a count of failed volumes for each DataNode, but no further details
• Apache Hadoop 2.7.0 provides more information: failed path, estimated lost capacity and failure
date/time
• An administrator can use this information to prioritize cluster maintenance work
Page 31
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Reporting DataNode Volume Failures
Page 32
Architecting the Future of Big Data
New
© Hortonworks Inc. 2011
Reporting DataNode Volume Failures
Page 33
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Reporting DataNode Volume Failures
Page 34
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Reporting DataNode Volume Failures
• Everything in the web UI is sourced from standardized Hadoop metrics
– Each DataNode publishes its own metrics
– NameNode publishes aggregate information from every DataNode
• Metrics accessible through JMX or the HTTP /jmx URI
• Integrated in Ambari
• Can be integrated into your preferred management tools and ops dashboards
Page 35
Architecting the Future of Big Data
New System to Manage the Health of Hadoop
Clusters
• Ambari Alerts are installed and configured by default
• Health Alerts and Metrics managed via Ambari Web
© Hortonworks Inc. 2011
Summary
• Configuration
– Prevent garbage collection issues
– Configure for redundancy
– Retune configuration in response to metrics
• HDFS ACLs
– Implement fine-grained authorization rules on files
– Can protect against accidental file manipulations
• HDFS Snapshots
– Point-in-time image of part of the filesystem
– Useful for restoring to a prior state after accidental file manipulation
• Reporting DataNode Volume Failures
– Metrics and web UI exposing information about volume failures on DataNodes
– Useful for planning cluster maintenance work
• Use Ambari
– Helps install, configure, monitor and manage Hadoop clusters
– Incorporates the latest best practices
Page 37
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Thank you, Q&A
Resource Location
Hardware
Recommendations for
Apache Hadoop
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_cluster-planning-
guide/content/ch_hardware-recommendations.html
HDFS operational and
debuggability
improvements
https://issues.apache.org/jira/browse/HDFS-6185
HDFS ACLs Blog Post http://hortonworks.com/blog/hdfs-acls-fine-grained-permissions-hdfs-files-hadoop/
HDFS Snapshots Blog Post http://hortonworks.com/blog/protecting-your-enterprise-data-with-hdfs-snapshots/
Learn more
Contact me with your operations questions and suggestions
Chris Nauroth – cnauroth@hortonworks.com

More Related Content

What's hot

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
DataWorks Summit
 
HDFS Selective Wire Encryption
HDFS Selective Wire EncryptionHDFS Selective Wire Encryption
HDFS Selective Wire Encryption
Konstantin V. Shvachko
 

What's hot (20)

Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
HDFS Selective Wire Encryption
HDFS Selective Wire EncryptionHDFS Selective Wire Encryption
HDFS Selective Wire Encryption
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 

Viewers also liked

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 

Viewers also liked (13)

Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Hadoop
HadoopHadoop
Hadoop
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 

Similar to Hadoop Operations - Best Practices from the Field

Establishing Environment Best Practices T12 Brendan Law
Establishing Environment Best Practices T12 Brendan LawEstablishing Environment Best Practices T12 Brendan Law
Establishing Environment Best Practices T12 Brendan Law
Flamer
 
Backup netezza-tsm-v1403c-140330170451-phpapp01
Backup netezza-tsm-v1403c-140330170451-phpapp01Backup netezza-tsm-v1403c-140330170451-phpapp01
Backup netezza-tsm-v1403c-140330170451-phpapp01
Arunkumar Shanmugam
 

Similar to Hadoop Operations - Best Practices from the Field (20)

Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
 
Interactive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryInteractive Hadoop via Flash and Memory
Interactive Hadoop via Flash and Memory
 
HDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and SupportabilityHDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and Supportability
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1
 
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Democratizing Memory Storage
Democratizing Memory StorageDemocratizing Memory Storage
Democratizing Memory Storage
 
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache Hadoop
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
 
Establishing Environment Best Practices T12 Brendan Law
Establishing Environment Best Practices T12 Brendan LawEstablishing Environment Best Practices T12 Brendan Law
Establishing Environment Best Practices T12 Brendan Law
 
Best And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsBest And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM Connections
 
Backup netezza-tsm-v1403c-140330170451-phpapp01
Backup netezza-tsm-v1403c-140330170451-phpapp01Backup netezza-tsm-v1403c-140330170451-phpapp01
Backup netezza-tsm-v1403c-140330170451-phpapp01
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hive
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Hadoop Operations - Best Practices from the Field

  • 1. Hadoop Operations – Best Practices from the Field June 11, 2015 Chris Nauroth email: cnauroth@hortonworks.com twitter: @cnauroth Suresh Srinivas email: suresh@hortonworks.com twitter: @suresh_m_s
  • 2. © Hortonworks Inc. 2011 About Me Chris Nauroth • Member of Technical Staff, Hortonworks – Apache Hadoop committer, PMC member, and Apache Software Foundation member – Major contributor to HDFS ACLs, Windows compatibility, and operability improvements • Hadoop user since 2010 – Prior employment experience deploying, maintaining and using Hadoop clusters Page 2 Architecting the Future of Big Data
  • 3. © Hortonworks Inc. 2011 Agenda • Analysis of Hadoop Support Cases – Support case trends – Configuration – Software Improvements • Key Learnings and Best Practices – HDFS ACLs – HDFS Snapshots – Reporting DataNode Volume Failures Page 3 Architecting the Future of Big Data
  • 4. © Hortonworks Inc. 2011 Support Case Trends – Proportional Cases per Month Page 4 Architecting the Future of Big Data 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 HDFS Map Reduce YARN Other (37 components)
  • 5. © Hortonworks Inc. 2011 Support Case Trends – Root Cause Page 5 Architecting the Future of Big Data 0 200 400 600 800 1000 1200 Customer Environment (Non HDP) Documentation Defect Documentation Gap Documentation Not Utilized Education - Configuration Needs Training Product Defect YARN Map Reduce HDFS
  • 6. © Hortonworks Inc. 2011 Support Case Trends • Core Hadoop components (HDFS, YARN and MapReduce) are used across all deployments, and therefore receive proportionally more support cases than other ecosystem components. • Misconfiguration is the dominant root cause. • Documentation is a close second. • We are constantly improving the code to eliminate operational issues, help with diagnosis and provide increased visibility. • Best practices get incorporated into Apache Ambari for improved defaults, simplified configuration and deeper monitoring. Page 6 Architecting the Future of Big Data
  • 8. © Hortonworks Inc. 2011 Configuration - Hardware and Cluster Sizing • Considerations –Larger clusters heal faster on nodes or disk failure –Machines with huge storage take longer to recover –More racks give more failure domains • Recommendations – Get good-quality commodity hardware – Buy the sweet-spot in pricing: 3TB disk, 96GB, 8-12 cores – More memory is better – real time is memory hungry! – Before considering fatter machines (1U 6 disks vs. 2U 12 disks) – Get to 30-40 machines or 3-4 racks –Use pilot cluster to learn about load patterns – Balanced hardware for I/O, compute or memory bound – More details - http://tinyurl.com/hwx-hadoop-hw Page 8
  • 9. © Hortonworks Inc. 2011 Configuration – JVM Tuning • Avoid JVM issues – Use 64 bit JVM for all daemons – Compressed OOPS enabled by default (6 u23 and later) – Java heap size – Set same max and starting heapsize, Xmx == Xms – Avoid java defaults – configure NewSize and MaxNewSize – Use 1/8 to 1/6 of max size for JVMs larger than 4G – Configure –XX:PermSize=128 MB, -XX:MaxPermSize=256 MB – Use low-latency GC collector – -XX:+UseConcMarkSweepGC, -XX:ParallelGCThreads=<N> – High <N> on Namenode and JobTracker or ResourceManager – Important JVM configs to help debugging – -verbose:gc -Xloggc:<file> -XX:+PrintGCDetails – -XX:ErrorFile=<file> – -XX:+HeapDumpOnOutOfMemoryError Page 9
  • 10. © Hortonworks Inc. 2011 Configuration • Deploy with QuorumJournalManager for high availability • Configure open fd ulimit – Default 1024 is too low – 16K for datanodes, 64K for Master nodes • Use version control for configuration! Page 10
  • 11. © Hortonworks Inc. 2011 Configuration • Use disk fail in place for datanodes: dfs.datanode.failed.volumes.tolerated – Disk failure is no longer datanode failure – Especially important for large density nodes • Set dfs.namenode.name.dir.restore to true – Restores NN storage directory during checkpointing • Take periodic backups of namenode metadata – Make copies of the entire storage directory • Set aside a lot of disk space for NN logs – It is verbose – set aside multiple GBs – Many installs configure this too small – NN logs roll with in minutes – hard to debug issues Page 11
  • 12. © Hortonworks Inc. 2011 Configuration – Monitoring Usage • Cluster storage, nodes, files, blocks grows – Update NN heap, handler count, number of DN xceivers – Tweak other related config periodically • Monitor the hardware usage for your work load – Disk I/O, network I/O, CPU and memory usage – Use this information when expanding cluster capacity • Monitor the usage with HADOOP metrics – JVM metrics – GC times, Memory used, Thread Status – RPC metrics – especially latency to track slowdowns – HDFS metrics – Used storage, # of files and blocks, total load on the cluster – File System operations – MapReduce Metrics – Slot utilization and Job status • Tweak configurations during upgrades/maintenance on an ongoing basis Page 12
  • 13. HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Install & Configure: Ambari Guided Configuration Guide configuration and provide recommendations for the most common settings. (HBase Example Shown here)
  • 14. Software Improvements Real Incidents and Software Improvements to Address Them
  • 15. © Hortonworks Inc. 2011 Don’t edit the metadata files! • Editing can corrupt the cluster state – Might result in loss of data • Real incident – NN misconfigured to point to another NN’s metadata – DNs can’t register due to namespace ID mismatch – System detected the problem correctly – Safety net ignored by the admin! – Admin edits the namenode VERSION file to match ids Mass deletion of unknown blocks that do not exist in that namespace Page 15
  • 16. © Hortonworks Inc. 2011 Improvement • Pause deletion of blocks when the namenode starts up – https://issues.apache.org/jira/browse/HDFS-6186 – Supports configurable delay of block deletions after NameNode startup – Gives an admin extra time to diagnose before deletions begin • Show when block deletion will start after NameNode startup in WebUI – https://issues.apache.org/jira/browse/HDFS-6385 – The web UI already displayed the number of pending block deletions – This enhanced the display to indicate when actual deletion will begin Page 16 Architecting the Future of Big Data
  • 17. © Hortonworks Inc. 2011 Block Deletion Start Time Page 17 Architecting the Future of Big Data New
  • 18. © Hortonworks Inc. 2011 Guard Against Accidental Deletion • rm –r deletes the data at the speed of Hadoop! – ctrl-c of the command does not stop deletion! – Undeleting files on datanodes is hard & time consuming – Immediately shutdown NN, unmount disks on datanodes – Recover deleted files – Start namenode without the delete operation in edits • Enable Trash • Real Incident – Customer is running a distro of Hadoop with trash not enabled – Deletes a large dir (100 TB) and shuts down NN immediately – Support person asks NN to be restarted to see if trash is enabled! Blocks start deleting Page 18
  • 19. © Hortonworks Inc. 2011 Improvement • HDFS Snapshots – https://issues.apache.org/jira/browse/HDFS-2802 – A snapshot is a read-only point-in-time image of part of the file system – A snapshot created before a deletion can be used to restore deleted data – More coverage of snapshots later in the presentation • HDFS ACLs – https://issues.apache.org/jira/browse/HDFS-4685 – Finer-grained control of file permissions can help prevent an accidental deletion – More coverage of ACLs later in the presentation Page 19 Architecting the Future of Big Data
  • 20. © Hortonworks Inc. 2011 Unexpected error during HA HDFS upgrade • Background: HDFS HA Architecture – http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html • Real Incident – During upgrade, NameNode calls every JournalNode to request backup of metadata directory, which renames “current” directory to “previous.tmp”. – Permissions incorrect on metadata directory for 1 out of 3 JournalNodes. – The hdfs user is not authorized to rename. Backup fails for that JournalNode, so upgrade process aborts with error. Root cause not easily identifiable, long time to recover Page 20 Architecting the Future of Big Data
  • 21. © Hortonworks Inc. 2011 Improvement • Improve diagnostics on storage directory rename operations by using native code. – https://issues.apache.org/jira/browse/HDFS-7118 – Logs additional root cause information for rename failure. For example, EACCES • Split error checks in into separate conditions to improve diagnostics. – https://issues.apache.org/jira/browse/HDFS-7119 – Splits a log message about failure to delete or rename into separate log messages to clarify which specific action failed • When aborting NameNode or JournalNode, write the contents of the metadata directories and permissions to logs. – https://issues.apache.org/jira/browse/HDFS-7120 – Usually the first information asked of the user, so we can automate this • For JournalNode operations that must succeed on all nodes, execute a pre-check to verify that the operation can succeed. – https://issues.apache.org/jira/browse/HDFS-7121 – Prevents need for manual cleanup on 2 out of 3 JournalNodes where backup succeeded Page 21 Architecting the Future of Big Data
  • 22. Key Learnings and Best Practices Features that Help Improve Production Operations
  • 23. © Hortonworks Inc. 2011 HDFS ACLs • Existing HDFS POSIX permissions good, but not flexible enough – Permission requirements may differ from the natural organizational hierarchy of users and groups. • HDFS ACLs augment the existing HDFS POSIX permissions model by implementing the POSIX ACL model. – An ACL (Access Control List) provides a way to set different permissions for specific named users or named groups, not only the file’s owner and file’s group. Page 23 Architecting the Future of Big Data
  • 24. © Hortonworks Inc. 2011 HDFS File Permissions Example • Authorization requirements: –In a sales department, they would like a single user Maya (Department Manager) to control all modifications to sales data –Other members of sales department need to view the data, but can’t modify it. –Everyone else in the company must not be allowed to view the data. • Can be implemented via the following: Read/Write perm for user maya User Group Read perm for group sales File with sales data
  • 25. © Hortonworks Inc. 2011 HDFS ACLs • Problem –No longer feasible for Maya to control all modifications to the file – New Requirement: Maya, Diane and Clark are allowed to make modifications – New Requirement: New group called executives should be able to read the sales data –Current permissions model only allows permissions at 1 group and 1 user • Solution: HDFS ACLs –Now assign different permissions to different users and groups Owner Group Others HDFS Directory … rwx … rwx … rwx Group D … rwx Group F … rwx User Y … rwx
  • 26. © Hortonworks Inc. 2011 HDFS ACLs New Tools for ACL Management (setfacl, getfacl) – hdfs dfs -setfacl -m group:execs:r-- /sales-data – hdfs dfs -getfacl /sales-data # file: /sales-data # owner: maya # group: sales user::rw- group::r-- group:execs:r-- mask::r-- other::-- – How do you know if a directory has ACLs set? – hdfs dfs -ls /sales-data Found 1 items -rw-r-----+ 3 maya sales 0 2014-03-04 16:31 /sales-data
  • 27. © Hortonworks Inc. 2011 HDFS ACLs Best Practices • Start with traditional HDFS permissions to implement most permission requirements. • Define a smaller number of ACLs to handle exceptional cases. • A file with an ACL incurs an additional cost in memory in the NameNode compared to a file that has only traditional permissions. Page 27 Architecting the Future of Big Data
  • 28. © Hortonworks Inc. 2011 HDFS Snapshots • HDFS Snapshots – A snapshot is a read-only point-in-time image of part of the file system – Performance: snapshot creation is instantaneous, regardless of data size or subtree depth – Reliability: snapshot creation is atomic – Scalability: snapshots do not create extra copies of data blocks – Useful for protecting against accidental deletion of data • Example: Daily Feeds hdfs dfs -ls /daily-feeds Found 5 items drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-13 drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-14 drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-15 drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-16 drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-17 Page 28 Architecting the Future of Big Data
  • 29. © Hortonworks Inc. 2011 HDFS Snapshots • Create a snapshot after each daily load hdfs dfsadmin -allowSnapshot /daily-feeds Allowing snaphot on /daily-feeds succeeded hdfs dfs -createSnapshot /daily-feeds snapshot-to-2014-10-17 Created snapshot /daily-feeds/.snapshot/snapshot-to-2014-10-17 • User accidentally deletes data for 2014-10-16 hdfs dfs -ls /daily-feeds Found 4 items drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-13 drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-14 drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-15 drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-17 Page 29 Architecting the Future of Big Data
  • 30. © Hortonworks Inc. 2011 HDFS Snapshots • Snapshots to the rescue: the data is still in the snapshot hdfs dfs -ls /daily-feeds/.snapshot/snapshot-to-2014-10-17 Found 5 items drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily- feeds/.snapshot/snapshot-to-2014-10-17/2014-10-13 drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily- feeds/.snapshot/snapshot-to-2014-10-17/2014-10-14 drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily- feeds/.snapshot/snapshot-to-2014-10-17/2014-10-15 drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily- feeds/.snapshot/snapshot-to-2014-10-17/2014-10-16 drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily- feeds/.snapshot/snapshot-to-2014-10-17/2014-10-17 • Restore data from 2014-10-16 hdfs dfs -cp /daily-feeds/.snapshot/snapshot-to-2014-10-17/2014-10-16 /daily-feeds Page 30 Architecting the Future of Big Data
  • 31. © Hortonworks Inc. 2011 Reporting DataNode Volume Failures • Configuring dfs.datanode.failed.volumes.tolerated > 0 enables a DataNode to keep running after volume failures • DataNode is still running, but capacity is degraded • HDFS already provided a count of failed volumes for each DataNode, but no further details • Apache Hadoop 2.7.0 provides more information: failed path, estimated lost capacity and failure date/time • An administrator can use this information to prioritize cluster maintenance work Page 31 Architecting the Future of Big Data
  • 32. © Hortonworks Inc. 2011 Reporting DataNode Volume Failures Page 32 Architecting the Future of Big Data New
  • 33. © Hortonworks Inc. 2011 Reporting DataNode Volume Failures Page 33 Architecting the Future of Big Data
  • 34. © Hortonworks Inc. 2011 Reporting DataNode Volume Failures Page 34 Architecting the Future of Big Data
  • 35. © Hortonworks Inc. 2011 Reporting DataNode Volume Failures • Everything in the web UI is sourced from standardized Hadoop metrics – Each DataNode publishes its own metrics – NameNode publishes aggregate information from every DataNode • Metrics accessible through JMX or the HTTP /jmx URI • Integrated in Ambari • Can be integrated into your preferred management tools and ops dashboards Page 35 Architecting the Future of Big Data
  • 36. New System to Manage the Health of Hadoop Clusters • Ambari Alerts are installed and configured by default • Health Alerts and Metrics managed via Ambari Web
  • 37. © Hortonworks Inc. 2011 Summary • Configuration – Prevent garbage collection issues – Configure for redundancy – Retune configuration in response to metrics • HDFS ACLs – Implement fine-grained authorization rules on files – Can protect against accidental file manipulations • HDFS Snapshots – Point-in-time image of part of the filesystem – Useful for restoring to a prior state after accidental file manipulation • Reporting DataNode Volume Failures – Metrics and web UI exposing information about volume failures on DataNodes – Useful for planning cluster maintenance work • Use Ambari – Helps install, configure, monitor and manage Hadoop clusters – Incorporates the latest best practices Page 37 Architecting the Future of Big Data
  • 38. © Hortonworks Inc. 2011 Thank you, Q&A Resource Location Hardware Recommendations for Apache Hadoop http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_cluster-planning- guide/content/ch_hardware-recommendations.html HDFS operational and debuggability improvements https://issues.apache.org/jira/browse/HDFS-6185 HDFS ACLs Blog Post http://hortonworks.com/blog/hdfs-acls-fine-grained-permissions-hdfs-files-hadoop/ HDFS Snapshots Blog Post http://hortonworks.com/blog/protecting-your-enterprise-data-with-hdfs-snapshots/ Learn more Contact me with your operations questions and suggestions Chris Nauroth – cnauroth@hortonworks.com

Editor's Notes

  1. First, a quick introduction. My name is Chris Nauroth. I’m a software engineer on the HDFS team at Hortonworks. I’m an Apache Hadoop committer and PMC member. I’m also an Apache Software Foundation member. Some of my major contributions include HDFS ACLs, Windows compatibility and various operability improvements. Prior to Hortonworks, I worked for Disney and did an initial deployment of Hadoop there. As part of that job, I worked very closely with the systems engineering team responsible for maintaining those Hadoop clusters, so I tend to think back to that team and get excited about things I can do now as a software engineer to help make that team’s job easier. I’m also here with Suresh Srinivas, one of the founders of Hortonworks, and a long-time Hadoop committer and PMC member. He has a lot of experience supporting some of the world’s largest clusters at Yahoo and elsewhere. Together with Suresh, we have experience supporting Hadoop clusters since 2008.
  2. For today’s agenda, I’d like to start by sharing some analysis that we’ve done of support case trends. In that analysis, we’re going to see that some common patterns emerge, and that’s going to lead into a discussion of configuration best practices and software improvements. In the second half of the talk, we’ll move into a discussion of key learnings and best practices around how recent HDFS features can help prevent problems or manage day-to-day maintenance.
  3. Let’s dive into the support case analysis. The data source for this chart is the entire history of support cases at Hortonworks. The x-axis is month and the y-axis is the proportion of support cases reported against a specific component. The chart focuses on 3 components that we define as the core of Hadoop: HDFS, YARN and MapReduce. All other components in the ecosystem are collapsed into a single line. Here we see a trend stabilizing around 30% of support cases driven from those core components. It also makes sense intuitively that a large proportion of support cases are driven from those core components, because every deployment uses them. As you rise up the stack, deployments start to vary in the components they choose to deploy. For example, a deployment may or may not deploy Hbase depending on its use cases.
  4. The second chart shows an analysis of root cause category in each of those 3 core components. The source data contains many additional root cause categories. I’ve chosen to prune this down to the most significant ones to simplify the chart. The pattern that we see here is that a lot of support cases are driven by configuration issues or documentation problems. On an interesting side note, I gave a version of this presentation last year at Strata, and since then I’ve refreshed these charts with current data. Something I noticed is that documentation, configuration and software defects are propotionally a little bit smaller than last time. We’ve been investing a lot of energy in these areas, so it was satisfying to see the data showing that those efforts have been somewhat successful.
  5. Investment in operations at the core helps the most users.
  6. With that, let’s move into a discussion of common configuration issues that we continue to see.
  7. Fewer nodes is less resilient than many nodes. Failure of a DataNode that’s heavier on storage causes more re-replication activity. Map Reduce jobs may need to rerun more tasks. Commodity != poor quality.
  8. Compressed ordinary object pointers are a technique used in the JVM to represent managed pointers as 32-bit offsets from a 64-bit base heap address. This saves on the space taken by 64-bit native pointers. We used to have a recommendation to pass a JVM argument to turn this on. Recent JVM versions just use it by default. Xmx different from Xms can cause big expensive malloc. Surprising results when you run out of memory late in the process lifetime. N=8 typically. Oom-killer.
  9. NameNode high availability was a very hot topic a few years ago. At this point, the recommended HA architecture is to use QuorumJournalManager, which sets up an active-standby pair of NameNodes and offloads edit logging to a separate set of daemons called the JournalNodes. On a side note, version control for configuration is a good thing. It can be helpful to look back on the history of changes or restore to a last known good state.
  10. The DataNode has a feature called disk-fail-in-place that allows it to keep running even if individual volumes have failed. This is off by default, but you can turn it on by editing hdfs-site.xml and setting property dfs.datanode.failed.volumes.tolerated to the number of volumes that you tolerate failing before shutting down the entire DataNode. This is useful for large-density nodes, meaning nodes that have a lot of disks. If you have a node with 16 disks, and 2 disks fail, you’d probably prefer to keep that DataNode running with 14 disks available to serve clients instead of shutting down the whole thing. dfs.namenode.name.dir.restore is a property that controls whether or not the NameNode should attempt to bring back into service metadata storage directories that previously failed. By turning this on, you have the ability to repair a failed directory online and bring it back into service without restarting the NameNode process. We recommend taking periodic backups of the NameNode metadata. Copy the entire storage directory. Also plan on reserving a lot of disk for NameNode logs. A common pitfall is choosing too little space for logs, which then forces you to configure Log4J to roll logs very rapidly, and this can make debugging harder.
  11. Something to keep in mind that usage patterns on a cluster tend to change over time as use cases change. Configuration may need to change in reaction to changing usage patterns. If you have a major upgrade or maintenance planned, then that’s a good opportunity to review configurations and see if anything else needs to change.
  12. Increasingly, we’re pushing configuration best practices into the implementation of Ambari. This takes the burden off of administrators to remember these best practices during deployments. For those who don’t know, Apache Ambari is an open source cluster deployment and management tool. For a little variety, I chose to pull a screenshot related to HBase. Here we can see that Ambari starts by recommending some good defaults, but still gives administrators the option to tune settings to match their specific needs.
  13. Next, I’d like to discuss a few software improvements that were prompted by our experiences in support cases. We’ve found that often very small code changes can have a big impact on preventing problems or recovering from them. I’m going to discuss some real incidents that we’ve seen and how they led us to make those code changes.
  14. First, a public service announcement: don’t edit the metadata files. The NameNode metadata files are crucial for maintaining the state of the file system, so editing them can corrupt cluster state and result in loss of data. Don’t edit them. Now that I’ve said that, let’s talk about editing the metadata files. This is a real incident. A NameNode was misconfigured to point to the metadata from a different NameNode. An important note here is that part of the NameNode metadata is a namespace ID, which uniquely identifies that file system namespace. When DataNodes register with a NameNode for the first time, they also acquire that namespace ID and persist it locally. On subsequent DataNode restarts, the NameNode has a check that the DataNode attempting to register with it is presenting the same namespace ID. After NameNode restart, the DataNodes could not register with the NameNode because of the namespace ID mismatch. The system detected the problem correctly, and so far everything is working as designed. However, the admin thought an appropriate fix would be to manually edit the VERSION file, which is the part of the metadata containing the namespace ID, and change it to match what the DataNodes were reporting. “What happens next?” The problem is that the NameNode’s fsimage also persists the block IDs that are known for each file. When these DataNodes from a different cluster started sending their block reports, the NameNode replied by saying these blocks do not exist in my namespace, and therefore they should be deleted.
  15. This is the HDFS web UI, now with a small enhancement to show the time when block deletions will start.
  16. HDFS is known for being a scalable system. One of the things it’s really awesome at is scaling deletes! This can be a scary situation if someone deletes the wrong thing, because attempting to recover by undeleting block files is error-prone and time-consuming work across all DataNodes. We recommend enabling the HDFS trash feature as a safety net, which essentially changes deletes into renames, and the NameNode can then reap the trash files at a later time. However, I’m going to talk about a real incident in which trash was not enabled. There was a large directory deleted, and the admin realized this was a mistake and chose to shut down the NameNode immediately. The support engineer taking the case naturally figured we could restore from trash, so advised restarting the NameNode. “What happens next?”
  17. This incident really points out the importance of protecting data against accidental deletion. HDFS snapshots and HDFS ACLs are two features that I think help with this. I’ll have more coverage of these features later in the presentation.
  18. “What happens next?”
  19. If you’ve used POSIX ACLs on a Linux file system, then you already know how it works in HDFS too.
  20. By convention, snapshots can be referenced as a file system path under sub-directory “.snapshot”.
  21. Here is a screenshot pointing out a change in the HDFS web UI: Total Datanode Volume Failures is a hyperlink. Clicking that jumps to…
  22. …this new screen listing the volume failures in detail. We can see the path of each failed storage location, and an estimate of the capacity that was lost. I think of this screen being used by a system engineer as a to-do list as part of regular cluster maintenance.
  23. Here is what it looks like when there are no volume failures. I included this picture, because this is what we all want it to look like. Of course, it won’t always be that way.