SlideShare a Scribd company logo
1 of 24
1
An Introduction to Cloudera’s Administrator
Training for Apache Hadoop
Ian Wrigley
Sr. Curriculum Manager
ian@cloudera.com
2© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 Why Take Cloudera Training?
 Administrator Course Contents
 A Deeper Dive: An overview of HDFS High Availability
 A Deeper Dive: Some of Hadoop’s advanced configuration options
 Question time
Topics
3
1 Broadest Range of Courses
Developer, Admin, Analyst, HBase, Data Science
2
3
Most Experienced Instructors
Over 15,000 students trained since 2009
5 Widest Geographic Coverage
Most classes offered: 50 cities worldwide plus online
6 Most Relevant Platform & Community
CDH deployed more than all other distributions combined
7 Depth of Training Material
Hands-on exercises and VMs support live instruction
Leader in Certification
Over 5,000 accredited Cloudera professionals
4 State of the Art Curriculum
Classes updated regularly as Hadoop evolves 8 Ongoing Learning
Video tutorials and e-learning complement training
Why Cloudera Training?
4
Data Analyst
Training
Implement massively distributed, columnar storage at scale
Enable random, real-time read/write access to all data
HBase
Training
Configure, install, and monitor clusters for optimal performance
Implement security measures and multi-user functionality
Vertically integrate basic analytics into data management
Transform and manipulate data to drive high-value utilization
Enterprise
Training
Use Cloudera Manager to speed deployment and scale the cluster
Learn which tools and techniques improve cluster performance
Learning Path: System Administrators
Administrator
Training
5© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 Why Take Training?
 Administrator Course Contents
 A Deeper Dive: An overview of HDFS High Availability
 A Deeper Dive: Some of Hadoop’s advanced configuration options
 Question time
Topics
6© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
During the Administrator course, you learn:
 The core technologies of Hadoop
 How to populate HDFS from external sources
 How to plan your Hadoop cluster hardware and software
 How to deploy a Hadoop cluster
 What issues to consider when installing Pig, Hive, and Impala
 What issues to consider when deploying Hadoop clients
 How Cloudera Manager can simplify Hadoop administration
 How to configure HDFS for high availability
 What issues to consider when implementing Hadoop security
Administrator Course Objectives
7© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 How to schedule jobs on the cluster
 How to maintain your cluster
 How to monitor, troubleshoot, and optimize the cluster
Administrator Course Objectives (cont’d)
8© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 The course features many Hands-On Exercises, including:
–Deploying Hadoop in pseudo-distributed mode
–Deploying a complete, multi-node Hadoop cluster
–Importing data into HDFS using Sqoop and Flume
–Installing Hive and Impala
–Using Hue to control user access
–Configuring HDFS High Availability
–Configuring the FairScheduler
–Troubleshooting problems on the cluster
–… and more
Hands-On Exercises
9© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
Course Chapters
 Introduction
 Planning Your Hadoop Cluster
 Hadoop Installation and Initial Configuration
 Installing and Configuring Hive, Impala, and Pig
 Hadoop Clients
 Cloudera Manager
 Advanced Cluster Configuration
 Hadoop Security
Introduction to Apache Hadoop
Planning, Installing, and
Configuring a Hadoop Cluster
Course Introduction
 The Case for Apache Hadoop
 HDFS
 Getting Data Into HDFS
 MapReduce
 Managing and Scheduling Jobs
 Cluster Maintenance
 Cluster Monitoring and Troubleshooting
 Conclusion
 Kerberos Configuration
 Configuring HDFS Federation
Cluster Operations and
Maintenance
Course Conclusion and Appendices
10© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 Why Take Training?
 Administrator Course Contents
 A Deeper Dive: An overview of HDFS High Availability
 A Deeper Dive: Some of Hadoop’s advanced configuration options
 Question time
Topics
11© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 A single NameNode is a single point of failure
 Two ways a NameNode can result in HDFS downtime
–Unexpected NameNode crash (rare)
–Planned maintenance of NameNode (more common)
 HDFS High Availability (HA) eliminates this SPOF
–Available in CDH4 (or related Apache Hadoop 0.23.x, and 2.x)
HDFS High Availability Overview
12© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 HDFS High Availability uses a pair of NameNodes
–One Active and one Standby
–Clients only contact the Active NameNode
–DataNodes heartbeat in to both NameNodes
–Active NameNode writes its metadata to a quorum of JournalNodes
–Standby NameNode reads from the JournalNodes to remain in sync with
the Active NameNode
HDFS High Availability Architecture
NameNode
(Active)/Quorum
Journal Manager
DataNode
DataNode
DataNode
DataNode
NameNode
(Standby)/Quorum
Journal Manager
JournalNode
JournalNode
JournalNode
13© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 Active NameNode writes edits to the JournalNodes
–Software to do this is the Quorum Journal Manager (QJM)
–Built in to the NameNode
–Waits for a success acknowledgment from the majority of JournalNodes
–Majority commit means a single crashed or lagging JournalNode
will not impact NameNode latency
–Uses the Paxos algorithm to ensure reliability even if edits are being
written as a JournalNode fails
 Note that there is no Secondary NameNode when implementing HDFS
High Availability
–The Standby NameNode periodically performs checkpointing
HDFS High Availability Architecture (cont’d)
14© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 Only one NameNode must be active at any given time
–The other is in standby mode
 The standby maintains a copy of the active NameNode’s state
–So it can take over when the active NameNode goes down
 Two types of failover
–Manual (detected and initiated by a user)
–Automatic (detected and initiated by HDFS itself)
Failover
15© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 Automatic failover is based on Apache ZooKeeper
–A coordination service system also used by HBase
–An open source Apache project
–One of the components in CDH
 A daemon called the ZooKeeper Failover Controller (ZKFC) runs on each
NameNode machine
 ZooKeeper needs a quorum of nodes
–Typical installations use three or five nodes
–Low resource usage
–Can install alongside existing master daemons
Automatic Failover
16© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
HDFS HA With Automatic Failover – Deployment
DataNodeDataNode DataNodeDataNode
JournalNode
JournalNode
JournalNode
ZooKeeper Ensemble - Instances Typically Reside on Master Nodes
NameNode
(Active)/Quorum
Journal Manager
ZooKeeper
Failover
Controller
NameNode
(Standby)/
Quorum Journal
Manager
ZooKeeper
Failover
Controller
ZooKeeperZooKeeper ZooKeeper
Must Reside
on the
Same Host
JournalNodes
Typically Reside
on Master Nodes
Must Reside
on the
Same Host
17© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 Why Take Training?
 Administrator Course Contents
 A Deeper Dive: An overview of HDFS High Availability
 A Deeper Dive: Some of Hadoop’s more advanced configuration options
 Question time
Topics
18© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
hdfs-site.xml
dfs.namenode.handler.count The number of threads the NameNode
uses to handle RPC requests from
DataNodes. Default: 10. Recommended:
ln(number of cluster nodes) * 20.
Symptoms of this being set too low:
‘connection refused’ messages in
DataNode logs as they try to transmit
block reports to the NameNode. Used by
the NameNode.
dfs.datanode.failed.
volumes.tolerated
The number of volumes allowed to fail
before the DataNode takes itself offline,
ultimately resulting in all of its blocks
being re-replicated. Default: 0, but often
increased on machines with several
disks. Used by DataNodes.
19© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
core-site.xml
fs.trash.interval When a file is deleted, it is placed in a
.Trash directory in the user’s home
directory, rather than being immediately
deleted. It is purged from HDFS after the
number of minutes specified. Default: 0
(disabled). Recommended: 1440 (one
day). Used by clients and the
NameNode.
20© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
mapred-site.xml
mapred.job.tracker.
handler.count
Number of threads used by the
JobTracker to respond to heartbeats
from the TaskTrackers. Default: 10.
Recommendation: ln(number of cluster
nodes) * 20. Used by the JobTracker.
mapred.reduce.parallel.
copies
Number of TaskTrackers a Reducer can
connect to in parallel to transfer its data.
Default: 5. Recommendation: ln(number
of cluster nodes) * 4 with a floor of 10.
Used by TaskTrackers.
tasktracker.http.threads The number of HTTP threads in the
TaskTracker which the Reducers use to
retrieve data. Default: 40.
Recommendation: 80. Used by
TaskTrackers.
21© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
mapred-site.xml (cont’d)
mapred.reduce.slowstart.
completed.maps
The percentage of Map tasks which must
be completed before the JobTracker will
schedule Reducers on the cluster.
Default: 0.05 (5 percent).
Recommendation: 0.8 (80 percent).
Used by the JobTracker.
22© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 Why Take Training?
 Administrator Course Contents
 A Deeper Dive: An overview of HDFS High Availability
 A Deeper Dive: Some of Hadoop’s more advanced configuration options
 Question time
Topics
23
24
• Submit questions in the Q&A panel
• Watch on-demand video of this webinar
and many more at http://cloudera.com
• Follow Ian on Twitter @iwrigley
• Follow Cloudera University @ClouderaU
• Learn more at Strata + Hadoop World:
http://tinyurl.com/hadoopworld
• Thank you for attending!
Register now for Cloudera training at
http://university.cloudera.com
Use discount code Admin_10 to save
10% on new enrollments in
Administrator Training classes delivered
by Cloudera until December 1, 2013*
Use discount code 15off2 to save 15% on
enrollments in two or more training
classes delivered by Cloudera until
December 1, 2013*
* Excludes classes sold or delivered by Cloudera partners

More Related Content

What's hot

Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst TrainingCloudera, Inc.
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityEdureka!
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterEdureka!
 
Introduction to Apache HBase Training
Introduction to Apache HBase TrainingIntroduction to Apache HBase Training
Introduction to Apache HBase TrainingCloudera, Inc.
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!Edureka!
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterEdureka!
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldUwe Printz
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuningVitthal Gogate
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceUwe Printz
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeAdam Kawa
 

What's hot (20)

Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High Availability
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 
Introduction to Apache HBase Training
Introduction to Apache HBase TrainingIntroduction to Apache HBase Training
Introduction to Apache HBase Training
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduce
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 

Similar to Cloudera Admin Training HDFS HA

Hadoop HDFS and Oracle
Hadoop HDFS and OracleHadoop HDFS and Oracle
Hadoop HDFS and OracleJohan Louwers
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Hortonworks
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installationSumitra Pundlik
 
Hadoop World 2010: Productionizing Hadoop: Lessons Learned
Hadoop World 2010: Productionizing Hadoop: Lessons LearnedHadoop World 2010: Productionizing Hadoop: Lessons Learned
Hadoop World 2010: Productionizing Hadoop: Lessons LearnedCloudera, Inc.
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Охота на уязвимости Hadoop
Охота на уязвимости HadoopОхота на уязвимости Hadoop
Охота на уязвимости HadoopPositive Hack Days
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaData Con LA
 
AEM (CQ) Dispatcher Security and CDN+Browser Caching
AEM (CQ) Dispatcher Security and CDN+Browser CachingAEM (CQ) Dispatcher Security and CDN+Browser Caching
AEM (CQ) Dispatcher Security and CDN+Browser CachingAndrew Khoury
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for WindowsTerry Padgett
 
Sam fineberg big_data_hadoop_storage_options_3v9-1
Sam fineberg big_data_hadoop_storage_options_3v9-1Sam fineberg big_data_hadoop_storage_options_3v9-1
Sam fineberg big_data_hadoop_storage_options_3v9-1Pramod Gosavi
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1Giovanna Roda
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesDataWorks Summit
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsDataWorks Summit
 
250hadoopinterviewquestions
250hadoopinterviewquestions250hadoopinterviewquestions
250hadoopinterviewquestionsRamana Swamy
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFSApache Apex
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Mandakini Kumari
 

Similar to Cloudera Admin Training HDFS HA (20)

Hadoop HDFS and Oracle
Hadoop HDFS and OracleHadoop HDFS and Oracle
Hadoop HDFS and Oracle
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Hadoop World 2010: Productionizing Hadoop: Lessons Learned
Hadoop World 2010: Productionizing Hadoop: Lessons LearnedHadoop World 2010: Productionizing Hadoop: Lessons Learned
Hadoop World 2010: Productionizing Hadoop: Lessons Learned
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Охота на уязвимости Hadoop
Охота на уязвимости HadoopОхота на уязвимости Hadoop
Охота на уязвимости Hadoop
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jha
 
AEM (CQ) Dispatcher Security and CDN+Browser Caching
AEM (CQ) Dispatcher Security and CDN+Browser CachingAEM (CQ) Dispatcher Security and CDN+Browser Caching
AEM (CQ) Dispatcher Security and CDN+Browser Caching
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for Windows
 
Sam fineberg big_data_hadoop_storage_options_3v9-1
Sam fineberg big_data_hadoop_storage_options_3v9-1Sam fineberg big_data_hadoop_storage_options_3v9-1
Sam fineberg big_data_hadoop_storage_options_3v9-1
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Hadoop_content_by_sasidhar2
Hadoop_content_by_sasidhar2Hadoop_content_by_sasidhar2
Hadoop_content_by_sasidhar2
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
 
250hadoopinterviewquestions
250hadoopinterviewquestions250hadoopinterviewquestions
250hadoopinterviewquestions
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Cloudera Admin Training HDFS HA

  • 1. 1 An Introduction to Cloudera’s Administrator Training for Apache Hadoop Ian Wrigley Sr. Curriculum Manager ian@cloudera.com
  • 2. 2© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  Why Take Cloudera Training?  Administrator Course Contents  A Deeper Dive: An overview of HDFS High Availability  A Deeper Dive: Some of Hadoop’s advanced configuration options  Question time Topics
  • 3. 3 1 Broadest Range of Courses Developer, Admin, Analyst, HBase, Data Science 2 3 Most Experienced Instructors Over 15,000 students trained since 2009 5 Widest Geographic Coverage Most classes offered: 50 cities worldwide plus online 6 Most Relevant Platform & Community CDH deployed more than all other distributions combined 7 Depth of Training Material Hands-on exercises and VMs support live instruction Leader in Certification Over 5,000 accredited Cloudera professionals 4 State of the Art Curriculum Classes updated regularly as Hadoop evolves 8 Ongoing Learning Video tutorials and e-learning complement training Why Cloudera Training?
  • 4. 4 Data Analyst Training Implement massively distributed, columnar storage at scale Enable random, real-time read/write access to all data HBase Training Configure, install, and monitor clusters for optimal performance Implement security measures and multi-user functionality Vertically integrate basic analytics into data management Transform and manipulate data to drive high-value utilization Enterprise Training Use Cloudera Manager to speed deployment and scale the cluster Learn which tools and techniques improve cluster performance Learning Path: System Administrators Administrator Training
  • 5. 5© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  Why Take Training?  Administrator Course Contents  A Deeper Dive: An overview of HDFS High Availability  A Deeper Dive: Some of Hadoop’s advanced configuration options  Question time Topics
  • 6. 6© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. During the Administrator course, you learn:  The core technologies of Hadoop  How to populate HDFS from external sources  How to plan your Hadoop cluster hardware and software  How to deploy a Hadoop cluster  What issues to consider when installing Pig, Hive, and Impala  What issues to consider when deploying Hadoop clients  How Cloudera Manager can simplify Hadoop administration  How to configure HDFS for high availability  What issues to consider when implementing Hadoop security Administrator Course Objectives
  • 7. 7© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  How to schedule jobs on the cluster  How to maintain your cluster  How to monitor, troubleshoot, and optimize the cluster Administrator Course Objectives (cont’d)
  • 8. 8© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  The course features many Hands-On Exercises, including: –Deploying Hadoop in pseudo-distributed mode –Deploying a complete, multi-node Hadoop cluster –Importing data into HDFS using Sqoop and Flume –Installing Hive and Impala –Using Hue to control user access –Configuring HDFS High Availability –Configuring the FairScheduler –Troubleshooting problems on the cluster –… and more Hands-On Exercises
  • 9. 9© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. Course Chapters  Introduction  Planning Your Hadoop Cluster  Hadoop Installation and Initial Configuration  Installing and Configuring Hive, Impala, and Pig  Hadoop Clients  Cloudera Manager  Advanced Cluster Configuration  Hadoop Security Introduction to Apache Hadoop Planning, Installing, and Configuring a Hadoop Cluster Course Introduction  The Case for Apache Hadoop  HDFS  Getting Data Into HDFS  MapReduce  Managing and Scheduling Jobs  Cluster Maintenance  Cluster Monitoring and Troubleshooting  Conclusion  Kerberos Configuration  Configuring HDFS Federation Cluster Operations and Maintenance Course Conclusion and Appendices
  • 10. 10© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  Why Take Training?  Administrator Course Contents  A Deeper Dive: An overview of HDFS High Availability  A Deeper Dive: Some of Hadoop’s advanced configuration options  Question time Topics
  • 11. 11© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  A single NameNode is a single point of failure  Two ways a NameNode can result in HDFS downtime –Unexpected NameNode crash (rare) –Planned maintenance of NameNode (more common)  HDFS High Availability (HA) eliminates this SPOF –Available in CDH4 (or related Apache Hadoop 0.23.x, and 2.x) HDFS High Availability Overview
  • 12. 12© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  HDFS High Availability uses a pair of NameNodes –One Active and one Standby –Clients only contact the Active NameNode –DataNodes heartbeat in to both NameNodes –Active NameNode writes its metadata to a quorum of JournalNodes –Standby NameNode reads from the JournalNodes to remain in sync with the Active NameNode HDFS High Availability Architecture NameNode (Active)/Quorum Journal Manager DataNode DataNode DataNode DataNode NameNode (Standby)/Quorum Journal Manager JournalNode JournalNode JournalNode
  • 13. 13© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  Active NameNode writes edits to the JournalNodes –Software to do this is the Quorum Journal Manager (QJM) –Built in to the NameNode –Waits for a success acknowledgment from the majority of JournalNodes –Majority commit means a single crashed or lagging JournalNode will not impact NameNode latency –Uses the Paxos algorithm to ensure reliability even if edits are being written as a JournalNode fails  Note that there is no Secondary NameNode when implementing HDFS High Availability –The Standby NameNode periodically performs checkpointing HDFS High Availability Architecture (cont’d)
  • 14. 14© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  Only one NameNode must be active at any given time –The other is in standby mode  The standby maintains a copy of the active NameNode’s state –So it can take over when the active NameNode goes down  Two types of failover –Manual (detected and initiated by a user) –Automatic (detected and initiated by HDFS itself) Failover
  • 15. 15© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  Automatic failover is based on Apache ZooKeeper –A coordination service system also used by HBase –An open source Apache project –One of the components in CDH  A daemon called the ZooKeeper Failover Controller (ZKFC) runs on each NameNode machine  ZooKeeper needs a quorum of nodes –Typical installations use three or five nodes –Low resource usage –Can install alongside existing master daemons Automatic Failover
  • 16. 16© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. HDFS HA With Automatic Failover – Deployment DataNodeDataNode DataNodeDataNode JournalNode JournalNode JournalNode ZooKeeper Ensemble - Instances Typically Reside on Master Nodes NameNode (Active)/Quorum Journal Manager ZooKeeper Failover Controller NameNode (Standby)/ Quorum Journal Manager ZooKeeper Failover Controller ZooKeeperZooKeeper ZooKeeper Must Reside on the Same Host JournalNodes Typically Reside on Master Nodes Must Reside on the Same Host
  • 17. 17© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  Why Take Training?  Administrator Course Contents  A Deeper Dive: An overview of HDFS High Availability  A Deeper Dive: Some of Hadoop’s more advanced configuration options  Question time Topics
  • 18. 18© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. hdfs-site.xml dfs.namenode.handler.count The number of threads the NameNode uses to handle RPC requests from DataNodes. Default: 10. Recommended: ln(number of cluster nodes) * 20. Symptoms of this being set too low: ‘connection refused’ messages in DataNode logs as they try to transmit block reports to the NameNode. Used by the NameNode. dfs.datanode.failed. volumes.tolerated The number of volumes allowed to fail before the DataNode takes itself offline, ultimately resulting in all of its blocks being re-replicated. Default: 0, but often increased on machines with several disks. Used by DataNodes.
  • 19. 19© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. core-site.xml fs.trash.interval When a file is deleted, it is placed in a .Trash directory in the user’s home directory, rather than being immediately deleted. It is purged from HDFS after the number of minutes specified. Default: 0 (disabled). Recommended: 1440 (one day). Used by clients and the NameNode.
  • 20. 20© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. mapred-site.xml mapred.job.tracker. handler.count Number of threads used by the JobTracker to respond to heartbeats from the TaskTrackers. Default: 10. Recommendation: ln(number of cluster nodes) * 20. Used by the JobTracker. mapred.reduce.parallel. copies Number of TaskTrackers a Reducer can connect to in parallel to transfer its data. Default: 5. Recommendation: ln(number of cluster nodes) * 4 with a floor of 10. Used by TaskTrackers. tasktracker.http.threads The number of HTTP threads in the TaskTracker which the Reducers use to retrieve data. Default: 40. Recommendation: 80. Used by TaskTrackers.
  • 21. 21© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. mapred-site.xml (cont’d) mapred.reduce.slowstart. completed.maps The percentage of Map tasks which must be completed before the JobTracker will schedule Reducers on the cluster. Default: 0.05 (5 percent). Recommendation: 0.8 (80 percent). Used by the JobTracker.
  • 22. 22© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  Why Take Training?  Administrator Course Contents  A Deeper Dive: An overview of HDFS High Availability  A Deeper Dive: Some of Hadoop’s more advanced configuration options  Question time Topics
  • 23. 23
  • 24. 24 • Submit questions in the Q&A panel • Watch on-demand video of this webinar and many more at http://cloudera.com • Follow Ian on Twitter @iwrigley • Follow Cloudera University @ClouderaU • Learn more at Strata + Hadoop World: http://tinyurl.com/hadoopworld • Thank you for attending! Register now for Cloudera training at http://university.cloudera.com Use discount code Admin_10 to save 10% on new enrollments in Administrator Training classes delivered by Cloudera until December 1, 2013* Use discount code 15off2 to save 15% on enrollments in two or more training classes delivered by Cloudera until December 1, 2013* * Excludes classes sold or delivered by Cloudera partners

Editor's Notes

  1. It’s perhaps more accurate to say that HDFS federation doesn’t change being a single point of failure much. If you have several volumes, it might be the case that the one that’s just failed isn’t the one you happen to need for a given job. On the other hand, if you have several NameNodes, the chance that any one of them might fail increases. We recommend using high-quality hardware for the master nodes, so NameNodes seldom fail. When they do, recovery is a straightforward process and there’s little chance for data loss (assuming administrators have configured things properly beforehand). There are many possible reasons for HDFS downtime (http://www.cloudera.com/blog/2011/02/hadoop-availability/), but these two are the most pertinent to our discussion.The best source of information on HA is Cloudera’s HDFS High Availability Guide (http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-High-Availability-Guide/CDH4-High-Availability-Guide.html).There’s a good overview of Quorum Journal Manager-based HDFS HA here: http://blog.cloudera.com/blog/2012/10/quorum-based-journaling-in-cdh4-1. Also some good information here: http://www.slideshare.net/cloudera/hdfs-update-lipcon-federal-big-data-apache-hadoop-forum. This link has HDFS HA design information: https://issues.apache.org/jira/secure/attachment/12547598/qjournal-design.pdf
  2. The metadata referenced here includes the fsimage and edit log.Clients only ever contact the active NameNode and will use a “virtual” NameNode address that always resolves to the currently active NameNode (as described a few slides later). When HA is enabled, all Data Nodes in the cluster are configured with the network addresses of both NameNodes. Data Nodes send all block reports, block location updates, and heartbeats to both Name Nodes, but only the currently Active NameNode will send commands to the DataNodes to do things such as delete blocks.This section describes a “hot standby” which is ready to take over immediately when the active NameNode fails. It’s possible to have a “cold standby” (machine does not have access to the current state, and may even be powered off), but failure recovery will take far longer so this is not the preferred approach.In theory, it's possible to run more than two NameNodes. However, this has not been tested, and in practice no one is using more than two NNs in production.We still recommend using “carrier grade” hardware for the Active and Standby NameNodes. If you’re transitioning an existing cluster to HA, you can reuse the Secondary NameNode hardware for your standby NameNode (since there’s no Secondary NameNode in HA, as illustrated here and further described on the next slide).
  3. The Active NameNode sends its edits to the JournalNodes via RPC; once it has an ACK from a majority of the JNs it is happy. The Standby NN reads from the JNs to ensure that its state is in sync with the Active NN. Paxos is the name of the algorithm used by the QJM and JNs to ensure that even if a JN fails as it's being written to, no edits are lost. Paxos is a well-known, well-tested distributed systems algorithm.In CDH4, there are some issues when you attempt to add additional new QJM nodes to an existing quorum. The workaround is using rsync to copy over the journal storage directory from a JournalNode already in existence and restarting.
  4. This slide discusses the concepts related to failover. We will discuss the configuration process for failover in detail in upcoming slides.
  5. There is a deployment diagram of HDFS HA with Automatic Failover on the next slide. You can either teach off of the current slide and then move to the diagram to reinforce what you covered, or you can spend as little time as possible on the bullets on the current slide and move directly to the diagram, whichever best suits your teaching style. A little detail about the bullet points:A ZooKeeper Failover Controller daemon runs on each NameNode machine. It monitors the NameNode and, if the NameNode fails, automatically fails HDFS over to the Standby. When the NameNode that was originally the Active NameNode comes back up, it comes up at the Standby NameNode. If desired, you can force a failback manually (using hdfs haadmin –failover).The ZooKeeper Failover Controller uses a replicated ZooKeeper ensemble to hold state. Note that the ZooKeeper Failover Controller is not a ZooKeeper server. It uses ZooKeeper to maintain state.In an HDFS HA automatic failover deployment, you will need to install, configure, and start a two NameNodes, JournalNodes, ZooKeeper servers, and ZooKeeper Failover Controllers.
  6. Even if you are doing most of your teaching from the previous slide, point out the differences between the HDFS HA deployment diagram a few slides ago and this one. (You might ask your students to identify which components did not appear on the first HDFS HA diagram.) Without automatic failover, there are no ZooKeeper Failover Controllers and there is no ZooKeeper ensemble.Point out that this is the logical architecture. Remind students about the physical architecture noted in the bullets on the previous slide:ZooKeeper Failover Controllers must run on the same hosts as the Active and Standby NameNodesThe ZooKeeper servers can run on any nodes in the clusterThe JournalNodes can run on any nodes as wellCloudera Solutions Architecture’s best practice as of this writing is to co-locate all of these servers on the Master nodes. For example, you might distribute the ZooKeeper servers and the JournalNodes across the hosts running the NameNodes and the JobTracker. These servers are critical to the success of HDFS HA and while the system has redundancies built in to tolerate node failure, it’s best if you can place these important servers on ultra-reliable hardware. It’s also recommended to provide separate disks for each ZooKeeper server, and separate disks for each JournalNode.
  7. ln is the natural logarithmTo determine the recommended baseline value for dfs.namenode.handler.count, you can go to a site like http://www.rapidtables.com/calc/math/Ln_Calc.htmto get the natural logarithm of a number, or you can run a Python function such as the following:python -c 'import math; print int(math.log(200) * 20)'The preceding command will work from the terminal windows in students’ lab environments, and it will work on Macs.
  8. Note that if trash is enabled on the server configuration, then the value configured on the server is used and the client configuration is ignored. If trash is disabled in the server configuration, then the client side configuration is checked.
  9. The notes for dfs.namenode.handler.count have information about how to obtain the natural logarithm for the number of nodes.