SlideShare a Scribd company logo
1 of 25
Download to read offline
for System Administrators – Hadoop for System Administrators O –h iOo hLiion uLxi nFuexs tF 2e0s1t 42014 
Justin Miller 
Senior Systems Engineer/DevOps at iHealth Technologies 
Weston Bassler 
Systems Engineer at Verizon Wireless
Hadoop for System Administrators – Ohio Linux Fest 2014 
What we will be covering: 
Intro 
Why Hadoop? 
How Hadoop Works 
Architecture 
Planning Hardware/Storage/Network 
Processing and Storage 
HDFS Components 
YARN Components 
Operations 
Job scheduling 
Jobs alerts 
Monitoring 
Core Services 
Job scheduler and SLA 
Hardware 
High Availability 
YARN 
HDFS 
Oozie 
Security 
Security Issues 
Authentication 
Authorization 
Encrption 
Backup and Recovery 
What to plan for? 
How to combat 
Hadoop Vendors/Distros 
Cloudera 
HortonWorks 
MapR
Hadoop for System Administrators – Ohio Linux Fest 2014 
Why Hadoop?
Hadoop for System Administrators – Ohio Linux Fest 2014 
Why Hadoop? Cont... 
Sort through TB, even PB worth of data in a matter of minutes 
Easily sift through LOGS (patterns, data mining) → switch logs, application 
logs 
Batch Processing 
History → Inspired by 2 Google Papers on MapReduce and GoogleFS 
Implemented By Yahoo!
Hadoop for System Administrators – Ohio Linux Fest 2014 
Whose using it?
Hadoop for System Administrators – Ohio Linux Fest 2014 
How Hadoop? 
Processing 
• MapReduce (MRv1) 
What is MapReduce? 
Nobody likes it 
• YARN (MRv2) 
Yet Another Resource Negotiator 
Newer better/versatile 
2 New Roles → Resource Manager and Application Manager 
Spark → New Hotness 
• Bringing Processing and Storage together 
Data locality → avoid network! 
“MO NODES MO BETTA”
Hadoop for System Administrators – Ohio Linux Fest 2014 
YARN in Action
Hadoop for System Administrators – Ohio Linux Fest 2014 
Storage 
• HDFS 
What is HDFS? 
Why HDFS? 
• Components of HDFS 
NameNode 
Metadata → fsimage + fsedits 
ZooKeeper → HA management 
Quorum based journaling 
3 JournalNodes 
Active/Passive NameNode 
DataNodes – what do they do? 
Blocks in relation to NameNode Metadata 
Block storage
Hadoop for System Administrators – Ohio Linux Fest 2014 
HDFS Write Path
Hadoop for System Administrators – Ohio Linux Fest 2014 
Benefits and Limitations of HDFS 
Benefits 
Low cost per byte → commodity storage 
High Bandwidth/Scales effectively → “Mo nodes Mo speed” 
Rock solid data reliability 
Supports distributed computing I/O patterns 
OPEN SOURCE!!!!!
Hadoop for System Administrators – Ohio Linux Fest 2014 
Benefits and Limitations of HDFS (Continued...) 
Limitations 
Updates → data is immutable (can't be updated only appended) 
Write Once 
Optimized for sequential reads → not for real-time data processing 
Challenging import/export → requires additional tooling
Hadoop for System Administrators – Ohio Linux Fest 2014 
Architectur e 
• Planning your Hardware/Storage 
Cheap disks 
Distributed disk approach → replication factor of 3 for HA 
NO LVM and NO Raid and NO swap 
noatime, nodiratime 
• Network considerations 
Rack awareness affects data distribution 
Prefer a faster network when available → 10GB if possible
Hadoop for System Administrators – Ohio Linux Fest 2014 
Hadoop Operations 
• Jobs 
What is a job? 
Scheduling jobs with Oozie 
Alerts on Jobs 
Oozie SLAs → Start time, end time & duration 
File driven Job Configuration
Hadoop for System Administrators – Ohio Linux Fest 2014 
Example of a Job: 
Example of a coordinator:
Hadoop for System Administrators – Ohio Linux Fest 2014 
Troubleshooting 
• Application → Debug Code
Hadoop for System Administrators – Ohio Linux Fest 2014 
• Job → Debug Execution
Hadoop for System Administrators – Ohio Linux Fest 2014 
• Service → Debug Linux Process (/var/log/hadoop-*) 
Services wont start → port conflicts (nmap, netstat, lsof) 
if not application OR job; 
do 
cat /var/log/hadoop-* | grep ERROR 
done
Hadoop for System Administrators – Ohio Linux Fest 2014 
Monitoring 
• Core Services 
HDFS 
YARN 
JMX → JVM Monitoring 
Cloudera Manager 
• Performance 
Ganglia (HortonWorks) 
Cloudera Manager 
• Hardware → to each his own (traditional monitoring) 
SNMP 
Nagios 
Zenoss 
Cloudera Manager
Hadoop for System Administrators – Ohio Linux Fest 2014 
High Availability 
• HDFS 
ZooKeeper → quorum based journaling 
• YARN 
ZooKeeper
Hadoop for System Administrators – Ohio Linux Fest 2014 
• Oozie HA
Hadoop for System Administrators – Ohio Linux Fest 2014 
Security (Because people are evil)
Hadoop for System Administrators – Ohio Linux Fest 2014 
Security Continued.... 
• Known issues – Stupid/Lazy People 
Hadoop can be very secure 
• Authentication - Kerberos 
Principal (user) 
Realm (group of principals) 
Keytab file 
• Authorization 
LDAP 
Active Directory 
Role based 
• Encryption – For your eyes Only! 
Kerberos 1st 
SSL Certificates 
**** SSL must be enabled for all core Hadoop services
Hadoop for System Administrators – Ohio Linux Fest 2014 
Backup and Recovery – When things go wrong (And they will) 
What can go wrong? What to plan for? 
Data Corruption 
Node crashes 
Disk crashes 
Ways to combat when things do go wrong 
• Data Corruption 
checksums of metadata fail → NameNode replaces with fresh 
HDFS → hdfs fsck tool 
• Node crashes/Disk crashes 
HDFS saves the day! 
NameNode HA 
First 2 replicas of data on different hosts 
Heartbeat detection
Hadoop for System Administrators – Ohio Linux Fest 2014 
Hadoop Wars - Vendors and Distributions 
• Cloudera 
Specializes in Enterprise tools 
Auditing 
Access Control 
Cluster Management (Cloudera Manager) 
• HortonWorks 
Specializes in Engineering 
Also Open Source 
Top new cool things 
• MapR 
Lead developers begin Mahout
Hadoop for System Administrators – Ohio Linux Fest 2014 
Hopefully you enjoyed! 
If interested: 
Quick Ways to get started Learning Hadoop 
• Free Stuff – Who doesn't like free? 
Big Data University – Hadoop fundamentals, Pig, Oozie, lots more 
Udactity – Intro to Hadoop and Mapreduce 
MapR, Cloudera, HortonWorks – Training Videos

More Related Content

What's hot

a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application ResourcesDataWorks Summit
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorialmarkgrover
 
Application Architectures with Hadoop - UK Hadoop User Group
Application Architectures with Hadoop - UK Hadoop User GroupApplication Architectures with Hadoop - UK Hadoop User Group
Application Architectures with Hadoop - UK Hadoop User Grouphadooparchbook
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoophadooparchbook
 
Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confHadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confSujee Maniyam
 
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaSolr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaLucidworks
 
Architecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an exampleArchitecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an examplehadooparchbook
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationshadooparchbook
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoophadooparchbook
 
Architectural considerations for Hadoop Applications
Architectural considerations for Hadoop ApplicationsArchitectural considerations for Hadoop Applications
Architectural considerations for Hadoop Applicationshadooparchbook
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Marcel Krcah
 
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applicationshadooparchbook
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialhadooparchbook
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit
 

What's hot (20)

a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resources
 
Hadoop
HadoopHadoop
Hadoop
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorial
 
Lecture 2 part 2
Lecture 2 part 2Lecture 2 part 2
Lecture 2 part 2
 
Application Architectures with Hadoop - UK Hadoop User Group
Application Architectures with Hadoop - UK Hadoop User GroupApplication Architectures with Hadoop - UK Hadoop User Group
Application Architectures with Hadoop - UK Hadoop User Group
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confHadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA conf
 
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaSolr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
 
Architecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an exampleArchitecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an example
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
Architectural considerations for Hadoop Applications
Architectural considerations for Hadoop ApplicationsArchitectural considerations for Hadoop Applications
Architectural considerations for Hadoop Applications
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)
 
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applications
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
 

Viewers also liked

Viewers also liked (7)

Odometer
OdometerOdometer
Odometer
 
Intellij idea for php
Intellij idea for phpIntellij idea for php
Intellij idea for php
 
Pekka
PekkaPekka
Pekka
 
internet
internetinternet
internet
 
Антропонимика
АнтропонимикаАнтропонимика
Антропонимика
 
Fast cycle board matrix
Fast cycle board matrixFast cycle board matrix
Fast cycle board matrix
 
навстречу олимпиаде в сочи
навстречу олимпиаде в сочинавстречу олимпиаде в сочи
навстречу олимпиаде в сочи
 

Similar to Hadoop for sys_admin

Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceDerek Chen
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemSteve Loughran
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaData Con LA
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabsSiva Sankar
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache HadoopHortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 

Similar to Hadoop for sys_admin (20)

Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hadoop
HadoopHadoop
Hadoop
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed Filesystem
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jha
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Presentation
PresentationPresentation
Presentation
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 

Recently uploaded

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Hadoop for sys_admin

  • 1. for System Administrators – Hadoop for System Administrators O –h iOo hLiion uLxi nFuexs tF 2e0s1t 42014 Justin Miller Senior Systems Engineer/DevOps at iHealth Technologies Weston Bassler Systems Engineer at Verizon Wireless
  • 2. Hadoop for System Administrators – Ohio Linux Fest 2014 What we will be covering: Intro Why Hadoop? How Hadoop Works Architecture Planning Hardware/Storage/Network Processing and Storage HDFS Components YARN Components Operations Job scheduling Jobs alerts Monitoring Core Services Job scheduler and SLA Hardware High Availability YARN HDFS Oozie Security Security Issues Authentication Authorization Encrption Backup and Recovery What to plan for? How to combat Hadoop Vendors/Distros Cloudera HortonWorks MapR
  • 3. Hadoop for System Administrators – Ohio Linux Fest 2014 Why Hadoop?
  • 4. Hadoop for System Administrators – Ohio Linux Fest 2014 Why Hadoop? Cont... Sort through TB, even PB worth of data in a matter of minutes Easily sift through LOGS (patterns, data mining) → switch logs, application logs Batch Processing History → Inspired by 2 Google Papers on MapReduce and GoogleFS Implemented By Yahoo!
  • 5. Hadoop for System Administrators – Ohio Linux Fest 2014 Whose using it?
  • 6. Hadoop for System Administrators – Ohio Linux Fest 2014 How Hadoop? Processing • MapReduce (MRv1) What is MapReduce? Nobody likes it • YARN (MRv2) Yet Another Resource Negotiator Newer better/versatile 2 New Roles → Resource Manager and Application Manager Spark → New Hotness • Bringing Processing and Storage together Data locality → avoid network! “MO NODES MO BETTA”
  • 7. Hadoop for System Administrators – Ohio Linux Fest 2014 YARN in Action
  • 8. Hadoop for System Administrators – Ohio Linux Fest 2014 Storage • HDFS What is HDFS? Why HDFS? • Components of HDFS NameNode Metadata → fsimage + fsedits ZooKeeper → HA management Quorum based journaling 3 JournalNodes Active/Passive NameNode DataNodes – what do they do? Blocks in relation to NameNode Metadata Block storage
  • 9. Hadoop for System Administrators – Ohio Linux Fest 2014 HDFS Write Path
  • 10. Hadoop for System Administrators – Ohio Linux Fest 2014 Benefits and Limitations of HDFS Benefits Low cost per byte → commodity storage High Bandwidth/Scales effectively → “Mo nodes Mo speed” Rock solid data reliability Supports distributed computing I/O patterns OPEN SOURCE!!!!!
  • 11. Hadoop for System Administrators – Ohio Linux Fest 2014 Benefits and Limitations of HDFS (Continued...) Limitations Updates → data is immutable (can't be updated only appended) Write Once Optimized for sequential reads → not for real-time data processing Challenging import/export → requires additional tooling
  • 12. Hadoop for System Administrators – Ohio Linux Fest 2014 Architectur e • Planning your Hardware/Storage Cheap disks Distributed disk approach → replication factor of 3 for HA NO LVM and NO Raid and NO swap noatime, nodiratime • Network considerations Rack awareness affects data distribution Prefer a faster network when available → 10GB if possible
  • 13. Hadoop for System Administrators – Ohio Linux Fest 2014 Hadoop Operations • Jobs What is a job? Scheduling jobs with Oozie Alerts on Jobs Oozie SLAs → Start time, end time & duration File driven Job Configuration
  • 14. Hadoop for System Administrators – Ohio Linux Fest 2014 Example of a Job: Example of a coordinator:
  • 15. Hadoop for System Administrators – Ohio Linux Fest 2014 Troubleshooting • Application → Debug Code
  • 16. Hadoop for System Administrators – Ohio Linux Fest 2014 • Job → Debug Execution
  • 17. Hadoop for System Administrators – Ohio Linux Fest 2014 • Service → Debug Linux Process (/var/log/hadoop-*) Services wont start → port conflicts (nmap, netstat, lsof) if not application OR job; do cat /var/log/hadoop-* | grep ERROR done
  • 18. Hadoop for System Administrators – Ohio Linux Fest 2014 Monitoring • Core Services HDFS YARN JMX → JVM Monitoring Cloudera Manager • Performance Ganglia (HortonWorks) Cloudera Manager • Hardware → to each his own (traditional monitoring) SNMP Nagios Zenoss Cloudera Manager
  • 19. Hadoop for System Administrators – Ohio Linux Fest 2014 High Availability • HDFS ZooKeeper → quorum based journaling • YARN ZooKeeper
  • 20. Hadoop for System Administrators – Ohio Linux Fest 2014 • Oozie HA
  • 21. Hadoop for System Administrators – Ohio Linux Fest 2014 Security (Because people are evil)
  • 22. Hadoop for System Administrators – Ohio Linux Fest 2014 Security Continued.... • Known issues – Stupid/Lazy People Hadoop can be very secure • Authentication - Kerberos Principal (user) Realm (group of principals) Keytab file • Authorization LDAP Active Directory Role based • Encryption – For your eyes Only! Kerberos 1st SSL Certificates **** SSL must be enabled for all core Hadoop services
  • 23. Hadoop for System Administrators – Ohio Linux Fest 2014 Backup and Recovery – When things go wrong (And they will) What can go wrong? What to plan for? Data Corruption Node crashes Disk crashes Ways to combat when things do go wrong • Data Corruption checksums of metadata fail → NameNode replaces with fresh HDFS → hdfs fsck tool • Node crashes/Disk crashes HDFS saves the day! NameNode HA First 2 replicas of data on different hosts Heartbeat detection
  • 24. Hadoop for System Administrators – Ohio Linux Fest 2014 Hadoop Wars - Vendors and Distributions • Cloudera Specializes in Enterprise tools Auditing Access Control Cluster Management (Cloudera Manager) • HortonWorks Specializes in Engineering Also Open Source Top new cool things • MapR Lead developers begin Mahout
  • 25. Hadoop for System Administrators – Ohio Linux Fest 2014 Hopefully you enjoyed! If interested: Quick Ways to get started Learning Hadoop • Free Stuff – Who doesn't like free? Big Data University – Hadoop fundamentals, Pig, Oozie, lots more Udactity – Intro to Hadoop and Mapreduce MapR, Cloudera, HortonWorks – Training Videos