SlideShare a Scribd company logo
1 of 5
Download to read offline
Page 1 of 5
Big Data Architect - Hadoop & its Eco-system Training
Objective
The participants will learn the Installation of Hadoop Cluster, understand the basic,
advanced concepts of Map Reduce and the best practices for Apache Hadoop
Development as experienced by the Developers, Architects and Data Analysts of
core Apache Hadoop. They will also learn the following during the duration of the
course
1. Hadoop Ecosystem
2. Best programming practices for Map Reduce
3. System administration issues with other Hadoop projects such as Hive,
Pig, and Scoop
4. Configuration Map Reduce environment with Eclipse IDE
5. Running MR Unit Tests on MR Code
6. Advanced Map Reduce Algorithms and techniques
7. Working with Pig and HIVE
8. Working with NoSQL with emphasis on HBase
Note: The course will be have 40% of theoretical discussion and 60% of actual
hands on
Duration:
28 ~ 30 hours
Audience
This course is designed for anyone who is
1. Wanting to architect a project using Hadoop and its Eco System
components.
2. Wanting to develop Map Reduce programs
3. A Business Analyst or Data Warehousing person looking at alternative
approach to data analysis and storage.
Pre-Requisites
1. The participants should have at least basic knowledge of Java.
2. Any experience of Linux environment will be very helpful.
Course Outline
1. What is Big Data & Why Hadoop?
Page 2 of 5
• Big Data Characteristics, Challenges with traditional system
2. Hadoop Overview & Ecosystem
• Anatomy of Hadoop Cluster, Installing and Configuring Hadoop
• Hands-On Exercise
3. Hadoop Architecture
• Components in Hadoop
• Interaction between different Components
• Basic Understanding of each component
4. HDFS – Hadoop Distributed File System
• Name Nodes and Data Nodes
• Hands-On Exercise
5. Map Reduce Anatomy
• How Map Reduce Works?
• The Mapper & Reducer, InputFormats & OutputFormats, Data Type
• Hands-On Exercise
6. Understanding Cloudera Distribution
• What is CDH?
• Components in CDH
• Hands on Exercise
7. Understanding HortonWorks Distribution
• What is HDP?
• Components in HDP
• Hands on Exercise
8. Pseudo – Cluster Distribution of Vanilla Hadoop
• Hadoop Extraction and Installation
• Configuration / XML Files
• Hands on Exercise
Page 3 of 5
9. YARN
• Need for YARN
• Architecture of YARN
10. Installation of YARN in Ubuntu
• Configuration Settings
• Difference between Gen1 and Gen2`
11. Developing Map Reduce Programs
• Setting up Eclipse Development Environment, Creating Map Reduce
Projects, Debugging Map Reduce Code
• Hands-On Exercise
13. Advanced Tips & Techniques
• Determining optimal number of reducers, skipping bad records
• Partitioning into multiple output files & Passing parameters to tasks
• Optimizing Hadoop Cluster & Performance Tuning
14. Monitoring & Management of Hadoop
• Managing HDFS with Tools
• Using HDFS & Job Tracker Web UI
• Routine Administration Procedures
• Commissioning and decommissioning of nodes
• Hands-On Exercise
15. Using Hive
• Hive as a Data Warehouse
• Creating External & Internal Tables plus Loading Data
• Writing HSQL queries for data retrieval
• Creating partitions and querying data.
16. Using Pig
• Why Pig and its benefits
• Loading data into PigStorage
Page 4 of 5
• Querying data from PigStorage
• Hands-On Exercise
17. Sqoop
• Importing and Exporting data from using RDBMS
• Hands-On Exercise
18. Understanding the Other SQL options in Hadoop
• Intro to Stinger
• Intro to Impala
19. Hadoop Best Practices and Use Cases
20. NoSql Introduction
• What is NoSQL?
• Variation of NoSQL
• Advantage of Columnar Database
21. HBase
• Hbase Overview and Architecture
• Hbase v/s RDBMS
• Hbase Table Design
• Column Families and Regions
• Hbase Java API code
• Hands-On Exercise
• Hbase Installation
• Hbase shell commands
• Java Administration API
• Performance Tuning
23. Oozie – Work Flow Scheduler
• Why Workflow in Hadoop
• Understanding Configuration in Oozie
Page 5 of 5
Take Away from the Course
1. Understanding of What and Why of Hadoop with its Eco-System
Components.
2. Ability to write Map Reduce programs in a given scenario
3. Ability to correctly architect and implement the Best Practices in Hadoop
Development
4. Ability to Manage and Monitor Hadoop
5. Ability to Manage the different Hadoop Components when talking to each
other.

More Related Content

What's hot

HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Sharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsSharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsDataWorks Summit
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemRajkumar Singh
 
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...Edureka!
 
Big Data Certifications Workshop - 201711 - Introduction and Database Essentials
Big Data Certifications Workshop - 201711 - Introduction and Database EssentialsBig Data Certifications Workshop - 201711 - Introduction and Database Essentials
Big Data Certifications Workshop - 201711 - Introduction and Database EssentialsDurga Gadiraju
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystemJakub Stransky
 
Hadoop course content @ a1 trainingss
Hadoop course content @ a1 trainingssHadoop course content @ a1 trainingss
Hadoop course content @ a1 trainingssA1 Trainings
 
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 Let Spark Fly: Advantages and Use Cases for Spark on Hadoop Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on HadoopMapR Technologies
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyRohit Kulkarni
 

What's hot (20)

Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
 
Hadoop
HadoopHadoop
Hadoop
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Sharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsSharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloads
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Spark vs Hadoop
Spark vs HadoopSpark vs Hadoop
Spark vs Hadoop
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
 
Big Data Certifications Workshop - 201711 - Introduction and Database Essentials
Big Data Certifications Workshop - 201711 - Introduction and Database EssentialsBig Data Certifications Workshop - 201711 - Introduction and Database Essentials
Big Data Certifications Workshop - 201711 - Introduction and Database Essentials
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
Anju
AnjuAnju
Anju
 
Hadoop course content @ a1 trainingss
Hadoop course content @ a1 trainingssHadoop course content @ a1 trainingss
Hadoop course content @ a1 trainingss
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 Let Spark Fly: Advantages and Use Cases for Spark on Hadoop Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
 

Similar to Hadoop Architect Training - Big Data, MapReduce, HBase

Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Big data analytics_using_hadoop
Big data analytics_using_hadoopBig data analytics_using_hadoop
Big data analytics_using_hadoopKnowledgehut
 
Hadoop online training in india
Hadoop online training  in indiaHadoop online training  in india
Hadoop online training in indiaMadhu Trainer
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online trainingsrikanthhadoop
 
Learn Hadoop at your Leisure time
Learn Hadoop at your Leisure time Learn Hadoop at your Leisure time
Learn Hadoop at your Leisure time Saantosh Rohera
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
Hadoop 2.0-development
Hadoop 2.0-developmentHadoop 2.0-development
Hadoop 2.0-developmentKnowledgehut
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-trainingGeohedrick
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Hadoop online trainings
Hadoop online trainingsHadoop online trainings
Hadoop online trainingsGeek Trainings
 
Hadoop Online training from www. Imaginelife.in
Hadoop Online training from www. Imaginelife.inHadoop Online training from www. Imaginelife.in
Hadoop Online training from www. Imaginelife.inImagine life
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 

Similar to Hadoop Architect Training - Big Data, MapReduce, HBase (20)

Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Big data analytics_using_hadoop
Big data analytics_using_hadoopBig data analytics_using_hadoop
Big data analytics_using_hadoop
 
Technix-Pro Cloudera Certified Admin for Hadoop Exam Prep.
Technix-Pro Cloudera Certified Admin for Hadoop Exam Prep.Technix-Pro Cloudera Certified Admin for Hadoop Exam Prep.
Technix-Pro Cloudera Certified Admin for Hadoop Exam Prep.
 
Hadoop online training in india
Hadoop online training  in indiaHadoop online training  in india
Hadoop online training in india
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 
Learn Hadoop at your Leisure time
Learn Hadoop at your Leisure time Learn Hadoop at your Leisure time
Learn Hadoop at your Leisure time
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Hadoop 2.0-development
Hadoop 2.0-developmentHadoop 2.0-development
Hadoop 2.0-development
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-training
 
Hadoop
HadoopHadoop
Hadoop
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Hadoop course content
Hadoop course contentHadoop course content
Hadoop course content
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Hadoop online trainings
Hadoop online trainingsHadoop online trainings
Hadoop online trainings
 
Hadoop Online training from www. Imaginelife.in
Hadoop Online training from www. Imaginelife.inHadoop Online training from www. Imaginelife.in
Hadoop Online training from www. Imaginelife.in
 
Hadoop 80hr v1.0
Hadoop 80hr v1.0Hadoop 80hr v1.0
Hadoop 80hr v1.0
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 

Hadoop Architect Training - Big Data, MapReduce, HBase

  • 1. Page 1 of 5 Big Data Architect - Hadoop & its Eco-system Training Objective The participants will learn the Installation of Hadoop Cluster, understand the basic, advanced concepts of Map Reduce and the best practices for Apache Hadoop Development as experienced by the Developers, Architects and Data Analysts of core Apache Hadoop. They will also learn the following during the duration of the course 1. Hadoop Ecosystem 2. Best programming practices for Map Reduce 3. System administration issues with other Hadoop projects such as Hive, Pig, and Scoop 4. Configuration Map Reduce environment with Eclipse IDE 5. Running MR Unit Tests on MR Code 6. Advanced Map Reduce Algorithms and techniques 7. Working with Pig and HIVE 8. Working with NoSQL with emphasis on HBase Note: The course will be have 40% of theoretical discussion and 60% of actual hands on Duration: 28 ~ 30 hours Audience This course is designed for anyone who is 1. Wanting to architect a project using Hadoop and its Eco System components. 2. Wanting to develop Map Reduce programs 3. A Business Analyst or Data Warehousing person looking at alternative approach to data analysis and storage. Pre-Requisites 1. The participants should have at least basic knowledge of Java. 2. Any experience of Linux environment will be very helpful. Course Outline 1. What is Big Data & Why Hadoop?
  • 2. Page 2 of 5 • Big Data Characteristics, Challenges with traditional system 2. Hadoop Overview & Ecosystem • Anatomy of Hadoop Cluster, Installing and Configuring Hadoop • Hands-On Exercise 3. Hadoop Architecture • Components in Hadoop • Interaction between different Components • Basic Understanding of each component 4. HDFS – Hadoop Distributed File System • Name Nodes and Data Nodes • Hands-On Exercise 5. Map Reduce Anatomy • How Map Reduce Works? • The Mapper & Reducer, InputFormats & OutputFormats, Data Type • Hands-On Exercise 6. Understanding Cloudera Distribution • What is CDH? • Components in CDH • Hands on Exercise 7. Understanding HortonWorks Distribution • What is HDP? • Components in HDP • Hands on Exercise 8. Pseudo – Cluster Distribution of Vanilla Hadoop • Hadoop Extraction and Installation • Configuration / XML Files • Hands on Exercise
  • 3. Page 3 of 5 9. YARN • Need for YARN • Architecture of YARN 10. Installation of YARN in Ubuntu • Configuration Settings • Difference between Gen1 and Gen2` 11. Developing Map Reduce Programs • Setting up Eclipse Development Environment, Creating Map Reduce Projects, Debugging Map Reduce Code • Hands-On Exercise 13. Advanced Tips & Techniques • Determining optimal number of reducers, skipping bad records • Partitioning into multiple output files & Passing parameters to tasks • Optimizing Hadoop Cluster & Performance Tuning 14. Monitoring & Management of Hadoop • Managing HDFS with Tools • Using HDFS & Job Tracker Web UI • Routine Administration Procedures • Commissioning and decommissioning of nodes • Hands-On Exercise 15. Using Hive • Hive as a Data Warehouse • Creating External & Internal Tables plus Loading Data • Writing HSQL queries for data retrieval • Creating partitions and querying data. 16. Using Pig • Why Pig and its benefits • Loading data into PigStorage
  • 4. Page 4 of 5 • Querying data from PigStorage • Hands-On Exercise 17. Sqoop • Importing and Exporting data from using RDBMS • Hands-On Exercise 18. Understanding the Other SQL options in Hadoop • Intro to Stinger • Intro to Impala 19. Hadoop Best Practices and Use Cases 20. NoSql Introduction • What is NoSQL? • Variation of NoSQL • Advantage of Columnar Database 21. HBase • Hbase Overview and Architecture • Hbase v/s RDBMS • Hbase Table Design • Column Families and Regions • Hbase Java API code • Hands-On Exercise • Hbase Installation • Hbase shell commands • Java Administration API • Performance Tuning 23. Oozie – Work Flow Scheduler • Why Workflow in Hadoop • Understanding Configuration in Oozie
  • 5. Page 5 of 5 Take Away from the Course 1. Understanding of What and Why of Hadoop with its Eco-System Components. 2. Ability to write Map Reduce programs in a given scenario 3. Ability to correctly architect and implement the Best Practices in Hadoop Development 4. Ability to Manage and Monitor Hadoop 5. Ability to Manage the different Hadoop Components when talking to each other.