Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop for sys_admin

764 views

Published on

A presentation for OhioLinuxFest for Hadoop for System Administrators

Published in: Technology
  • Be the first to comment

Hadoop for sys_admin

  1. 1. for System Administrators – Hadoop for System Administrators O –h iOo hLiion uLxi nFuexs tF 2e0s1t 42014 Justin Miller Senior Systems Engineer/DevOps at iHealth Technologies Weston Bassler Systems Engineer at Verizon Wireless
  2. 2. Hadoop for System Administrators – Ohio Linux Fest 2014 What we will be covering: Intro Why Hadoop? How Hadoop Works Architecture Planning Hardware/Storage/Network Processing and Storage HDFS Components YARN Components Operations Job scheduling Jobs alerts Monitoring Core Services Job scheduler and SLA Hardware High Availability YARN HDFS Oozie Security Security Issues Authentication Authorization Encrption Backup and Recovery What to plan for? How to combat Hadoop Vendors/Distros Cloudera HortonWorks MapR
  3. 3. Hadoop for System Administrators – Ohio Linux Fest 2014 Why Hadoop?
  4. 4. Hadoop for System Administrators – Ohio Linux Fest 2014 Why Hadoop? Cont... Sort through TB, even PB worth of data in a matter of minutes Easily sift through LOGS (patterns, data mining) → switch logs, application logs Batch Processing History → Inspired by 2 Google Papers on MapReduce and GoogleFS Implemented By Yahoo!
  5. 5. Hadoop for System Administrators – Ohio Linux Fest 2014 Whose using it?
  6. 6. Hadoop for System Administrators – Ohio Linux Fest 2014 How Hadoop? Processing • MapReduce (MRv1) What is MapReduce? Nobody likes it • YARN (MRv2) Yet Another Resource Negotiator Newer better/versatile 2 New Roles → Resource Manager and Application Manager Spark → New Hotness • Bringing Processing and Storage together Data locality → avoid network! “MO NODES MO BETTA”
  7. 7. Hadoop for System Administrators – Ohio Linux Fest 2014 YARN in Action
  8. 8. Hadoop for System Administrators – Ohio Linux Fest 2014 Storage • HDFS What is HDFS? Why HDFS? • Components of HDFS NameNode Metadata → fsimage + fsedits ZooKeeper → HA management Quorum based journaling 3 JournalNodes Active/Passive NameNode DataNodes – what do they do? Blocks in relation to NameNode Metadata Block storage
  9. 9. Hadoop for System Administrators – Ohio Linux Fest 2014 HDFS Write Path
  10. 10. Hadoop for System Administrators – Ohio Linux Fest 2014 Benefits and Limitations of HDFS Benefits Low cost per byte → commodity storage High Bandwidth/Scales effectively → “Mo nodes Mo speed” Rock solid data reliability Supports distributed computing I/O patterns OPEN SOURCE!!!!!
  11. 11. Hadoop for System Administrators – Ohio Linux Fest 2014 Benefits and Limitations of HDFS (Continued...) Limitations Updates → data is immutable (can't be updated only appended) Write Once Optimized for sequential reads → not for real-time data processing Challenging import/export → requires additional tooling
  12. 12. Hadoop for System Administrators – Ohio Linux Fest 2014 Architectur e • Planning your Hardware/Storage Cheap disks Distributed disk approach → replication factor of 3 for HA NO LVM and NO Raid and NO swap noatime, nodiratime • Network considerations Rack awareness affects data distribution Prefer a faster network when available → 10GB if possible
  13. 13. Hadoop for System Administrators – Ohio Linux Fest 2014 Hadoop Operations • Jobs What is a job? Scheduling jobs with Oozie Alerts on Jobs Oozie SLAs → Start time, end time & duration File driven Job Configuration
  14. 14. Hadoop for System Administrators – Ohio Linux Fest 2014 Example of a Job: Example of a coordinator:
  15. 15. Hadoop for System Administrators – Ohio Linux Fest 2014 Troubleshooting • Application → Debug Code
  16. 16. Hadoop for System Administrators – Ohio Linux Fest 2014 • Job → Debug Execution
  17. 17. Hadoop for System Administrators – Ohio Linux Fest 2014 • Service → Debug Linux Process (/var/log/hadoop-*) Services wont start → port conflicts (nmap, netstat, lsof) if not application OR job; do cat /var/log/hadoop-* | grep ERROR done
  18. 18. Hadoop for System Administrators – Ohio Linux Fest 2014 Monitoring • Core Services HDFS YARN JMX → JVM Monitoring Cloudera Manager • Performance Ganglia (HortonWorks) Cloudera Manager • Hardware → to each his own (traditional monitoring) SNMP Nagios Zenoss Cloudera Manager
  19. 19. Hadoop for System Administrators – Ohio Linux Fest 2014 High Availability • HDFS ZooKeeper → quorum based journaling • YARN ZooKeeper
  20. 20. Hadoop for System Administrators – Ohio Linux Fest 2014 • Oozie HA
  21. 21. Hadoop for System Administrators – Ohio Linux Fest 2014 Security (Because people are evil)
  22. 22. Hadoop for System Administrators – Ohio Linux Fest 2014 Security Continued.... • Known issues – Stupid/Lazy People Hadoop can be very secure • Authentication - Kerberos Principal (user) Realm (group of principals) Keytab file • Authorization LDAP Active Directory Role based • Encryption – For your eyes Only! Kerberos 1st SSL Certificates **** SSL must be enabled for all core Hadoop services
  23. 23. Hadoop for System Administrators – Ohio Linux Fest 2014 Backup and Recovery – When things go wrong (And they will) What can go wrong? What to plan for? Data Corruption Node crashes Disk crashes Ways to combat when things do go wrong • Data Corruption checksums of metadata fail → NameNode replaces with fresh HDFS → hdfs fsck tool • Node crashes/Disk crashes HDFS saves the day! NameNode HA First 2 replicas of data on different hosts Heartbeat detection
  24. 24. Hadoop for System Administrators – Ohio Linux Fest 2014 Hadoop Wars - Vendors and Distributions • Cloudera Specializes in Enterprise tools Auditing Access Control Cluster Management (Cloudera Manager) • HortonWorks Specializes in Engineering Also Open Source Top new cool things • MapR Lead developers begin Mahout
  25. 25. Hadoop for System Administrators – Ohio Linux Fest 2014 Hopefully you enjoyed! If interested: Quick Ways to get started Learning Hadoop • Free Stuff – Who doesn't like free? Big Data University – Hadoop fundamentals, Pig, Oozie, lots more Udactity – Intro to Hadoop and Mapreduce MapR, Cloudera, HortonWorks – Training Videos

×