Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
An Introduction to Cloudera’s Administrator
Training for Apache Hadoop
Ian Wrigley
Sr. Curriculum Manager
ian@cloudera.c...
2© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 Why Take Cloud...
3
1 Broadest Range of Courses
Developer, Admin, Analyst, HBase, Data Science
2
3
Most Experienced Instructors
Over 15,000 ...
4
Data Analyst
Training
Implement massively distributed, columnar storage at scale
Enable random, real-time read/write acc...
5© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 Why Take Train...
6© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
During the Admin...
7© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 How to schedul...
8© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 The course fea...
9© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
Course Chapters
...
10© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 Why Take Trai...
11© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 A single Name...
12© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 HDFS High Ava...
13© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 Active NameNo...
14© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 Only one Name...
15© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 Automatic fai...
16© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
HDFS HA With Au...
17© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 Why Take Trai...
18© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
hdfs-site.xml
d...
19© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
core-site.xml
f...
20© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
mapred-site.xml...
21© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
mapred-site.xml...
22© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
 Why Take Trai...
23
24
• Submit questions in the Q&A panel
• Watch on-demand video of this webinar
and many more at http://cloudera.com
• Foll...
Upcoming SlideShare
Loading in …5
×

Introduction to Cloudera's Administrator Training for Apache Hadoop

12,045 views

Published on

Learn who is best suited to attend the full Administrator Training, what prior knowledge you should have, and what topics the course covers. Cloudera Senior Curriculum Manager, Ian Wrigley, will discuss the skills you will attain during Admin Training and how they will help you move your Hadoop deployment from strategy to production and prepare for the Cloudera Certified Administrator for Apache Hadoop (CCAH) exam.

Published in: Technology
  • DOWNLOAD FULL BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • http://dbmanagement.info/Tutorials/MapReduce.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Introduction to Cloudera's Administrator Training for Apache Hadoop

  1. 1. 1 An Introduction to Cloudera’s Administrator Training for Apache Hadoop Ian Wrigley Sr. Curriculum Manager ian@cloudera.com
  2. 2. 2© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  Why Take Cloudera Training?  Administrator Course Contents  A Deeper Dive: An overview of HDFS High Availability  A Deeper Dive: Some of Hadoop’s advanced configuration options  Question time Topics
  3. 3. 3 1 Broadest Range of Courses Developer, Admin, Analyst, HBase, Data Science 2 3 Most Experienced Instructors Over 15,000 students trained since 2009 5 Widest Geographic Coverage Most classes offered: 50 cities worldwide plus online 6 Most Relevant Platform & Community CDH deployed more than all other distributions combined 7 Depth of Training Material Hands-on exercises and VMs support live instruction Leader in Certification Over 5,000 accredited Cloudera professionals 4 State of the Art Curriculum Classes updated regularly as Hadoop evolves 8 Ongoing Learning Video tutorials and e-learning complement training Why Cloudera Training?
  4. 4. 4 Data Analyst Training Implement massively distributed, columnar storage at scale Enable random, real-time read/write access to all data HBase Training Configure, install, and monitor clusters for optimal performance Implement security measures and multi-user functionality Vertically integrate basic analytics into data management Transform and manipulate data to drive high-value utilization Enterprise Training Use Cloudera Manager to speed deployment and scale the cluster Learn which tools and techniques improve cluster performance Learning Path: System Administrators Administrator Training
  5. 5. 5© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  Why Take Training?  Administrator Course Contents  A Deeper Dive: An overview of HDFS High Availability  A Deeper Dive: Some of Hadoop’s advanced configuration options  Question time Topics
  6. 6. 6© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. During the Administrator course, you learn:  The core technologies of Hadoop  How to populate HDFS from external sources  How to plan your Hadoop cluster hardware and software  How to deploy a Hadoop cluster  What issues to consider when installing Pig, Hive, and Impala  What issues to consider when deploying Hadoop clients  How Cloudera Manager can simplify Hadoop administration  How to configure HDFS for high availability  What issues to consider when implementing Hadoop security Administrator Course Objectives
  7. 7. 7© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  How to schedule jobs on the cluster  How to maintain your cluster  How to monitor, troubleshoot, and optimize the cluster Administrator Course Objectives (cont’d)
  8. 8. 8© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  The course features many Hands-On Exercises, including: –Deploying Hadoop in pseudo-distributed mode –Deploying a complete, multi-node Hadoop cluster –Importing data into HDFS using Sqoop and Flume –Installing Hive and Impala –Using Hue to control user access –Configuring HDFS High Availability –Configuring the FairScheduler –Troubleshooting problems on the cluster –… and more Hands-On Exercises
  9. 9. 9© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. Course Chapters  Introduction  Planning Your Hadoop Cluster  Hadoop Installation and Initial Configuration  Installing and Configuring Hive, Impala, and Pig  Hadoop Clients  Cloudera Manager  Advanced Cluster Configuration  Hadoop Security Introduction to Apache Hadoop Planning, Installing, and Configuring a Hadoop Cluster Course Introduction  The Case for Apache Hadoop  HDFS  Getting Data Into HDFS  MapReduce  Managing and Scheduling Jobs  Cluster Maintenance  Cluster Monitoring and Troubleshooting  Conclusion  Kerberos Configuration  Configuring HDFS Federation Cluster Operations and Maintenance Course Conclusion and Appendices
  10. 10. 10© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  Why Take Training?  Administrator Course Contents  A Deeper Dive: An overview of HDFS High Availability  A Deeper Dive: Some of Hadoop’s advanced configuration options  Question time Topics
  11. 11. 11© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  A single NameNode is a single point of failure  Two ways a NameNode can result in HDFS downtime –Unexpected NameNode crash (rare) –Planned maintenance of NameNode (more common)  HDFS High Availability (HA) eliminates this SPOF –Available in CDH4 (or related Apache Hadoop 0.23.x, and 2.x) HDFS High Availability Overview
  12. 12. 12© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  HDFS High Availability uses a pair of NameNodes –One Active and one Standby –Clients only contact the Active NameNode –DataNodes heartbeat in to both NameNodes –Active NameNode writes its metadata to a quorum of JournalNodes –Standby NameNode reads from the JournalNodes to remain in sync with the Active NameNode HDFS High Availability Architecture NameNode (Active)/Quorum Journal Manager DataNode DataNode DataNode DataNode NameNode (Standby)/Quorum Journal Manager JournalNode JournalNode JournalNode
  13. 13. 13© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  Active NameNode writes edits to the JournalNodes –Software to do this is the Quorum Journal Manager (QJM) –Built in to the NameNode –Waits for a success acknowledgment from the majority of JournalNodes –Majority commit means a single crashed or lagging JournalNode will not impact NameNode latency –Uses the Paxos algorithm to ensure reliability even if edits are being written as a JournalNode fails  Note that there is no Secondary NameNode when implementing HDFS High Availability –The Standby NameNode periodically performs checkpointing HDFS High Availability Architecture (cont’d)
  14. 14. 14© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  Only one NameNode must be active at any given time –The other is in standby mode  The standby maintains a copy of the active NameNode’s state –So it can take over when the active NameNode goes down  Two types of failover –Manual (detected and initiated by a user) –Automatic (detected and initiated by HDFS itself) Failover
  15. 15. 15© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  Automatic failover is based on Apache ZooKeeper –A coordination service system also used by HBase –An open source Apache project –One of the components in CDH  A daemon called the ZooKeeper Failover Controller (ZKFC) runs on each NameNode machine  ZooKeeper needs a quorum of nodes –Typical installations use three or five nodes –Low resource usage –Can install alongside existing master daemons Automatic Failover
  16. 16. 16© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. HDFS HA With Automatic Failover – Deployment DataNodeDataNode DataNodeDataNode JournalNode JournalNode JournalNode ZooKeeper Ensemble - Instances Typically Reside on Master Nodes NameNode (Active)/Quorum Journal Manager ZooKeeper Failover Controller NameNode (Standby)/ Quorum Journal Manager ZooKeeper Failover Controller ZooKeeperZooKeeper ZooKeeper Must Reside on the Same Host JournalNodes Typically Reside on Master Nodes Must Reside on the Same Host
  17. 17. 17© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  Why Take Training?  Administrator Course Contents  A Deeper Dive: An overview of HDFS High Availability  A Deeper Dive: Some of Hadoop’s more advanced configuration options  Question time Topics
  18. 18. 18© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. hdfs-site.xml dfs.namenode.handler.count The number of threads the NameNode uses to handle RPC requests from DataNodes. Default: 10. Recommended: ln(number of cluster nodes) * 20. Symptoms of this being set too low: ‘connection refused’ messages in DataNode logs as they try to transmit block reports to the NameNode. Used by the NameNode. dfs.datanode.failed. volumes.tolerated The number of volumes allowed to fail before the DataNode takes itself offline, ultimately resulting in all of its blocks being re-replicated. Default: 0, but often increased on machines with several disks. Used by DataNodes.
  19. 19. 19© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. core-site.xml fs.trash.interval When a file is deleted, it is placed in a .Trash directory in the user’s home directory, rather than being immediately deleted. It is purged from HDFS after the number of minutes specified. Default: 0 (disabled). Recommended: 1440 (one day). Used by clients and the NameNode.
  20. 20. 20© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. mapred-site.xml mapred.job.tracker. handler.count Number of threads used by the JobTracker to respond to heartbeats from the TaskTrackers. Default: 10. Recommendation: ln(number of cluster nodes) * 20. Used by the JobTracker. mapred.reduce.parallel. copies Number of TaskTrackers a Reducer can connect to in parallel to transfer its data. Default: 5. Recommendation: ln(number of cluster nodes) * 4 with a floor of 10. Used by TaskTrackers. tasktracker.http.threads The number of HTTP threads in the TaskTracker which the Reducers use to retrieve data. Default: 40. Recommendation: 80. Used by TaskTrackers.
  21. 21. 21© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent. mapred-site.xml (cont’d) mapred.reduce.slowstart. completed.maps The percentage of Map tasks which must be completed before the JobTracker will schedule Reducers on the cluster. Default: 0.05 (5 percent). Recommendation: 0.8 (80 percent). Used by the JobTracker.
  22. 22. 22© Copyright 2010-2013 Cloudera. All rights reserved. Not to be reproduced without prior written consent.  Why Take Training?  Administrator Course Contents  A Deeper Dive: An overview of HDFS High Availability  A Deeper Dive: Some of Hadoop’s more advanced configuration options  Question time Topics
  23. 23. 23
  24. 24. 24 • Submit questions in the Q&A panel • Watch on-demand video of this webinar and many more at http://cloudera.com • Follow Ian on Twitter @iwrigley • Follow Cloudera University @ClouderaU • Learn more at Strata + Hadoop World: http://tinyurl.com/hadoopworld • Thank you for attending! Register now for Cloudera training at http://university.cloudera.com Use discount code Admin_10 to save 10% on new enrollments in Administrator Training classes delivered by Cloudera until December 1, 2013* Use discount code 15off2 to save 15% on enrollments in two or more training classes delivered by Cloudera until December 1, 2013* * Excludes classes sold or delivered by Cloudera partners

×