SlideShare a Scribd company logo
1 of 30
HADOOP
TRAINING
HISTORY OF HADOOP
Hadoop was created by Doug Cutting, the creator
of Apache Lucene, the widely used text search
library. Hadoop has its origins in Apache Nutch, an
open source web search engine, itself a part of the
Lucene project.
 The name Hadoop is not an acronym; it’s a made-up name. The project’s
creator, Doug Cutting, explains how the name came about:
The name my kid gave a stuffed yellow elephant. Short,
relatively easy to spell and pronounce, meaningless, and not used
elsewhere: those are my naming criteria. Kids are good at generating
such. Googol is a kid’s term.
 Subprojects and “contrib” modules in Hadoop also tend to have names
that are unre-lated to their function, often with an elephant or other animal
theme (“Pig,” for example). Smaller components are given more
descriptive (and therefore more mun-dane) names. This is a good
principle, as it means you can generally work out what something does
from its name. For example, the jobtracker9 keeps track of MapReduce
jobs.
INTRODUCTION TO HADOOP
Hadoop is an open source software framework that supports data-
intensive distributed applications. It is licensed under the Apache v2
license, and generally known as Apache Hadoop.
Hadoop has been developed based on a paper originally written by
Google on MapReduce system and applies concepts of functional
programming; It is written in Java programming language and is the
highest-level Apache project being constructed and used by a global
community of contributors.
INTRODUCTION TO HADOOP
Big giants like Yahoo and Facebook are using Hadoop as an
integral part of their functioning – in 2008, Yahoo! Inc. established the
world’s largest Hadoop production application. Also, the Yahoo! Search
Webmap is a Hadoop application that runs on over 10,000 core Linux
clusters, generating data that is now widely used in every Yahoo! Web
search query. On the other hand, Facebook uses Apache Hadoop to
keep track of its billions of user profiles as well as all the data related to
them like their images, posts, comments, videos, etc.
Hadoop is not a database:
Hadoop an efficient distributed file system and not a
database. It is designed specifically for information that comes
in many forms, such as server log files or personal productivity
documents. Anything that can be stored as a file can be placed
in a Hadoop repository.
Hadoop is used for:
 Search - Yahoo, Amazon, Zvents
 Log processing - Facebook, Yahoo
 Data Warehouse - Facebook, AOL
 Video and Image Analysis - New York Times, Eyealike
WHY HADOOP ?
Hadoop is a free, Java-based programming framework
that supports the processing of large data sets in a distributed
computing environment.
Because Hadoop is open source and can run on commodity
hardware, the initial cost savings are dramatic and continue to
grow as your organizational data grows.
It is part of the Apache project sponsored by the Apache
Software Foundation.
WHY HADOOP ?
Single Source of Truth:-
With the enterprise data warehouse approach,
organizations find their data scattered across many systems and
silos. This decentralized environment can result in slow processing
and inefficient data analysis. Hadoop makes it possible to
consolidate your data and business intelligence capabilities within
an Enterprise Data Hub. The ability to save all organizational data
at its lowest level of granularity and bring all archive data into an
Enterprise Data Hub gives business users greater and faster
access to data.
WHY HADOOP ?
WHY HADOOP ?
Faster Data Processing:-
In legacy environments, traditional ETL and batch
processes can take hours, days, or even weeks, in a world where
businesses require access to data in minutes or seconds or even
sub-seconds. Hadoop excels at high-volume batch processing.
Because of its parallel processing, Hadoop can perform batch
processes 10 times faster than on a single thread server or on the
mainframe.
WHY HADOOP ?
Get More for Less:-
The true beauty of Hadoop is its ability to cost-effectively scale to
rapidly growing data demands. With its distributed computing power,
Hadoop configures across a cluster of commodity servers, or nodes. By
augmenting its EDW environment with Hadoop, the enterprise can
decrease its cost per terabyte of storage. With cheaper storage,
organizations can keep more data that was previously too expensive to
warehouse. This allows for the capture and storage of data from any
source within the organization while decreasing the amount of data that
is “thrown away” during data cleansing.
HADOOP INTERNAL SOFTWARE ARCHITECTURE
COMPONENTS OF HADOOP
The current Apache Hadoop ecosystem consists of
the Hadoop kernel, MapReduce, the Hadoop distributed file
system (HDFS) and a number of related projects such as
Apache Hive, HBase and Zookeeper. MapReduce and
Hadoop distributed file system (HDFS) are the main
component of Hadoop.
MapReduce:
The framework that understands and assigns work to
the nodes in a cluster
COMPONENTS OF HADOOP
Hadoop distributed file system (HDFS):
HDFS is the file system that spans all the nodes in a Hadoop
cluster for data storage. It links together the file systems on many
local nodes to make them into one big file system. HDFS assumes
nodes will fail, so it achieves reliability by replicating data across
multiple nodes.
HADOOP ECOSYSTEM
ADVANTAGE OF HADOOP
 Hadoop is Scalable
 Hadoop is Cost effective
 Hadoop is Flexible
 Hadoop is Fault tolerant
PREREQUISITE TO LEARN HADOOP ?
There is no strict prerequisite to start learning
Hadoop.
However, if you want to become an expert in
Hadoop and make an excellent career, you should
have at least basic knowledge of Java and Linux
IS JAVA REQUIRED TO LEARN HADOOP?
Knowing Java is an added advantage, but Java is not
strictly a prerequisite for working with Hadoop.
Why Java is not strictly a prerequisite:
Tools like Hive and Pig that are built on top of Hadoop offer
their own high-level languages for working with data on your cluster. If
you want to write your own MapReduce code, you can do so in any
language (e.g. Perl, Python, Ruby, C, etc.) that supports reading from
standard input and writing to standard output with Hadoop Streaming
IS JAVA REQUIRED TO LEARN HADOOP?
Added advantage of Java in Hadoop:
Although you can use Streaming to write your map
and reduce functions in the language of your choice, there
are some advanced features that are (at present) only
available via the Java API.
LINUX IS EXTRA BENEFIT WHILE LEARNING HADOOP?
Hadoop can run on Windows, it was built initially on Linux
and Linux is the preferred method for both installing and managing
Hadoop.
Having a solid understanding of getting around in a Linux
shell will also help you tremendously in digesting Hadoop,
especially with regards to many of the HDFS command line
parameters
COURSE CONTENT
Hadoop Introduction and Overview:
• What is Hadoop?
• History of Hadoop
• Building Blocks – Hadoop Eco-System
• Who is behind Hadoop?
• What Hadoop is good for and what it is not
Hadoop Distributed File System (HDFS):
• HDFS Overview and Architecture
• HDFS Installation
• Hadoop File System Shell
• File System Java API
COURSE CONTENT
Map/Reduce:
• Map/Reduce Overview and Architecture
• Installation
• Developing Map/Red Jobs
• Input and Output Formats
• Job Configuration
• Job Submission
• HDFS as a Source and Sink
• HBase as a Source and Sink
• Hadoop Streaming
COURSE CONTENT
HBase:
• HBase Overview and Architecture
• HBase Installation
• HBase Shell
• CRUD operations
• Scanning and Batching
• Filters
• HBase Key Design
COURSE CONTENT
Pig:
• Pig Overview
• Installation
• Pig Latin
• Pig with HDFS
Hive:
• Hive Overview
• Installation
• Hive QL
COURSE CONTENT
Sqoop:
• Sqoop Overview
• Installation
• Imports and Exports
Zoo Keeper:
• Zoo Keeper Overview
• Installation
• Server Mantainace
Putting it all together:
• Distributed installations
PLEASE CHECK THE LINK
http://www.keylabstraining.com/hadoop-online-
training-hyderabad-bangalore
PLEASE CONTACT:
 +91-9550-645-679 (India)
 +1-908-366-7933 (USA)
 Skype id : keylabstraining
 Email id : info@keylabstraining.com
Hadoop online training

More Related Content

What's hot

MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka Edureka!
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaWhat Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaEdureka!
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowDataWorks Summit
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Simplilearn
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Simplilearn
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveAvkash Chauhan
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Takrim Ul Islam Laskar
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...Simplilearn
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
Hadoop Distributed file system.pdf
Hadoop Distributed file system.pdfHadoop Distributed file system.pdf
Hadoop Distributed file system.pdfvishal choudhary
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasDataWorks Summit/Hadoop Summit
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 

What's hot (20)

Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaWhat Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Sqoop
SqoopSqoop
Sqoop
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Hadoop Distributed file system.pdf
Hadoop Distributed file system.pdfHadoop Distributed file system.pdf
Hadoop Distributed file system.pdf
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 

Viewers also liked

Interactively Search and Visualize Your Big Data
Interactively Search and Visualize Your Big DataInteractively Search and Visualize Your Big Data
Interactively Search and Visualize Your Big Datagethue
 
How companies use NoSQL & Couchbase - NoSQL Now 2014
How companies use NoSQL & Couchbase - NoSQL Now 2014How companies use NoSQL & Couchbase - NoSQL Now 2014
How companies use NoSQL & Couchbase - NoSQL Now 2014Dipti Borkar
 
Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseCloudera, Inc.
 
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...inside-BigData.com
 
Hadoop Ecosystem and Low Latency Streaming Architecture
Hadoop Ecosystem and Low Latency Streaming ArchitectureHadoop Ecosystem and Low Latency Streaming Architecture
Hadoop Ecosystem and Low Latency Streaming ArchitectureInSemble
 
Informatica Solution for SWIFT Integration
Informatica Solution for SWIFT IntegrationInformatica Solution for SWIFT Integration
Informatica Solution for SWIFT IntegrationKim Loughead
 
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution AnalyticsRevolution Analytics
 
Turning Big Data into More Effective Customer Experiences
Turning Big Data into More Effective Customer ExperiencesTurning Big Data into More Effective Customer Experiences
Turning Big Data into More Effective Customer ExperiencesNG DATA
 
ECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps DayECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps DayBob Sokol
 
Splout SQL - Web latency SQL views for Hadoop
Splout SQL - Web latency SQL views for HadoopSplout SQL - Web latency SQL views for Hadoop
Splout SQL - Web latency SQL views for Hadoopdatasalt
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...DataKitchen
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopHortonworks
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureRoman Nikitchenko
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR Technologies
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Hortonworks
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
 
HDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsHDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsDataWorks Summit
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...PwC
 

Viewers also liked (20)

Interactively Search and Visualize Your Big Data
Interactively Search and Visualize Your Big DataInteractively Search and Visualize Your Big Data
Interactively Search and Visualize Your Big Data
 
How companies use NoSQL & Couchbase - NoSQL Now 2014
How companies use NoSQL & Couchbase - NoSQL Now 2014How companies use NoSQL & Couchbase - NoSQL Now 2014
How companies use NoSQL & Couchbase - NoSQL Now 2014
 
Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBase
 
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
 
Hadoop Ecosystem and Low Latency Streaming Architecture
Hadoop Ecosystem and Low Latency Streaming ArchitectureHadoop Ecosystem and Low Latency Streaming Architecture
Hadoop Ecosystem and Low Latency Streaming Architecture
 
Informatica Solution for SWIFT Integration
Informatica Solution for SWIFT IntegrationInformatica Solution for SWIFT Integration
Informatica Solution for SWIFT Integration
 
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
 
Turning Big Data into More Effective Customer Experiences
Turning Big Data into More Effective Customer ExperiencesTurning Big Data into More Effective Customer Experiences
Turning Big Data into More Effective Customer Experiences
 
ECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps DayECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps Day
 
Splout SQL - Web latency SQL views for Hadoop
Splout SQL - Web latency SQL views for HadoopSplout SQL - Web latency SQL views for Hadoop
Splout SQL - Web latency SQL views for Hadoop
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache Hadoop
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
 
HDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsHDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once Semantics
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
 

Similar to Hadoop online training

Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in LudhianaE2MATRIX
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in MohaliE2MATRIX
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in AmritsarE2MATRIX
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoopOmar Jaber
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architectureHarikrishnan K
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop ToolsXplenty
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfSheetal Jain
 
What does the future of Big data look like?How to get a fresher job in data a...
What does the future of Big data look like?How to get a fresher job in data a...What does the future of Big data look like?How to get a fresher job in data a...
What does the future of Big data look like?How to get a fresher job in data a...Acutesoft Solutions India Pvt Ltd
 

Similar to Hadoop online training (20)

Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in Ludhiana
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in Mohali
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in Amritsar
 
Hadoop Tutorial for Beginners
Hadoop Tutorial for BeginnersHadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
 
Bigdata ppt
Bigdata pptBigdata ppt
Bigdata ppt
 
Bigdata
BigdataBigdata
Bigdata
 
Hadoop
HadoopHadoop
Hadoop
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
 
Lecture 2 Hadoop.pptx
Lecture 2 Hadoop.pptxLecture 2 Hadoop.pptx
Lecture 2 Hadoop.pptx
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 
Hadoop .pdf
Hadoop .pdfHadoop .pdf
Hadoop .pdf
 
HDFS
HDFSHDFS
HDFS
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdf
 
What does the future of Big data look like?How to get a fresher job in data a...
What does the future of Big data look like?How to get a fresher job in data a...What does the future of Big data look like?How to get a fresher job in data a...
What does the future of Big data look like?How to get a fresher job in data a...
 

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 

Recently uploaded (20)

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 

Hadoop online training

  • 1.
  • 3. HISTORY OF HADOOP Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open source web search engine, itself a part of the Lucene project.
  • 4.  The name Hadoop is not an acronym; it’s a made-up name. The project’s creator, Doug Cutting, explains how the name came about: The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria. Kids are good at generating such. Googol is a kid’s term.  Subprojects and “contrib” modules in Hadoop also tend to have names that are unre-lated to their function, often with an elephant or other animal theme (“Pig,” for example). Smaller components are given more descriptive (and therefore more mun-dane) names. This is a good principle, as it means you can generally work out what something does from its name. For example, the jobtracker9 keeps track of MapReduce jobs.
  • 5. INTRODUCTION TO HADOOP Hadoop is an open source software framework that supports data- intensive distributed applications. It is licensed under the Apache v2 license, and generally known as Apache Hadoop. Hadoop has been developed based on a paper originally written by Google on MapReduce system and applies concepts of functional programming; It is written in Java programming language and is the highest-level Apache project being constructed and used by a global community of contributors.
  • 6. INTRODUCTION TO HADOOP Big giants like Yahoo and Facebook are using Hadoop as an integral part of their functioning – in 2008, Yahoo! Inc. established the world’s largest Hadoop production application. Also, the Yahoo! Search Webmap is a Hadoop application that runs on over 10,000 core Linux clusters, generating data that is now widely used in every Yahoo! Web search query. On the other hand, Facebook uses Apache Hadoop to keep track of its billions of user profiles as well as all the data related to them like their images, posts, comments, videos, etc.
  • 7. Hadoop is not a database: Hadoop an efficient distributed file system and not a database. It is designed specifically for information that comes in many forms, such as server log files or personal productivity documents. Anything that can be stored as a file can be placed in a Hadoop repository.
  • 8. Hadoop is used for:  Search - Yahoo, Amazon, Zvents  Log processing - Facebook, Yahoo  Data Warehouse - Facebook, AOL  Video and Image Analysis - New York Times, Eyealike
  • 9. WHY HADOOP ? Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. Because Hadoop is open source and can run on commodity hardware, the initial cost savings are dramatic and continue to grow as your organizational data grows. It is part of the Apache project sponsored by the Apache Software Foundation.
  • 10. WHY HADOOP ? Single Source of Truth:- With the enterprise data warehouse approach, organizations find their data scattered across many systems and silos. This decentralized environment can result in slow processing and inefficient data analysis. Hadoop makes it possible to consolidate your data and business intelligence capabilities within an Enterprise Data Hub. The ability to save all organizational data at its lowest level of granularity and bring all archive data into an Enterprise Data Hub gives business users greater and faster access to data.
  • 12. WHY HADOOP ? Faster Data Processing:- In legacy environments, traditional ETL and batch processes can take hours, days, or even weeks, in a world where businesses require access to data in minutes or seconds or even sub-seconds. Hadoop excels at high-volume batch processing. Because of its parallel processing, Hadoop can perform batch processes 10 times faster than on a single thread server or on the mainframe.
  • 13. WHY HADOOP ? Get More for Less:- The true beauty of Hadoop is its ability to cost-effectively scale to rapidly growing data demands. With its distributed computing power, Hadoop configures across a cluster of commodity servers, or nodes. By augmenting its EDW environment with Hadoop, the enterprise can decrease its cost per terabyte of storage. With cheaper storage, organizations can keep more data that was previously too expensive to warehouse. This allows for the capture and storage of data from any source within the organization while decreasing the amount of data that is “thrown away” during data cleansing.
  • 14. HADOOP INTERNAL SOFTWARE ARCHITECTURE
  • 15. COMPONENTS OF HADOOP The current Apache Hadoop ecosystem consists of the Hadoop kernel, MapReduce, the Hadoop distributed file system (HDFS) and a number of related projects such as Apache Hive, HBase and Zookeeper. MapReduce and Hadoop distributed file system (HDFS) are the main component of Hadoop. MapReduce: The framework that understands and assigns work to the nodes in a cluster
  • 16. COMPONENTS OF HADOOP Hadoop distributed file system (HDFS): HDFS is the file system that spans all the nodes in a Hadoop cluster for data storage. It links together the file systems on many local nodes to make them into one big file system. HDFS assumes nodes will fail, so it achieves reliability by replicating data across multiple nodes.
  • 18. ADVANTAGE OF HADOOP  Hadoop is Scalable  Hadoop is Cost effective  Hadoop is Flexible  Hadoop is Fault tolerant
  • 19. PREREQUISITE TO LEARN HADOOP ? There is no strict prerequisite to start learning Hadoop. However, if you want to become an expert in Hadoop and make an excellent career, you should have at least basic knowledge of Java and Linux
  • 20. IS JAVA REQUIRED TO LEARN HADOOP? Knowing Java is an added advantage, but Java is not strictly a prerequisite for working with Hadoop. Why Java is not strictly a prerequisite: Tools like Hive and Pig that are built on top of Hadoop offer their own high-level languages for working with data on your cluster. If you want to write your own MapReduce code, you can do so in any language (e.g. Perl, Python, Ruby, C, etc.) that supports reading from standard input and writing to standard output with Hadoop Streaming
  • 21. IS JAVA REQUIRED TO LEARN HADOOP? Added advantage of Java in Hadoop: Although you can use Streaming to write your map and reduce functions in the language of your choice, there are some advanced features that are (at present) only available via the Java API.
  • 22. LINUX IS EXTRA BENEFIT WHILE LEARNING HADOOP? Hadoop can run on Windows, it was built initially on Linux and Linux is the preferred method for both installing and managing Hadoop. Having a solid understanding of getting around in a Linux shell will also help you tremendously in digesting Hadoop, especially with regards to many of the HDFS command line parameters
  • 23. COURSE CONTENT Hadoop Introduction and Overview: • What is Hadoop? • History of Hadoop • Building Blocks – Hadoop Eco-System • Who is behind Hadoop? • What Hadoop is good for and what it is not Hadoop Distributed File System (HDFS): • HDFS Overview and Architecture • HDFS Installation • Hadoop File System Shell • File System Java API
  • 24. COURSE CONTENT Map/Reduce: • Map/Reduce Overview and Architecture • Installation • Developing Map/Red Jobs • Input and Output Formats • Job Configuration • Job Submission • HDFS as a Source and Sink • HBase as a Source and Sink • Hadoop Streaming
  • 25. COURSE CONTENT HBase: • HBase Overview and Architecture • HBase Installation • HBase Shell • CRUD operations • Scanning and Batching • Filters • HBase Key Design
  • 26. COURSE CONTENT Pig: • Pig Overview • Installation • Pig Latin • Pig with HDFS Hive: • Hive Overview • Installation • Hive QL
  • 27. COURSE CONTENT Sqoop: • Sqoop Overview • Installation • Imports and Exports Zoo Keeper: • Zoo Keeper Overview • Installation • Server Mantainace Putting it all together: • Distributed installations
  • 28. PLEASE CHECK THE LINK http://www.keylabstraining.com/hadoop-online- training-hyderabad-bangalore
  • 29. PLEASE CONTACT:  +91-9550-645-679 (India)  +1-908-366-7933 (USA)  Skype id : keylabstraining  Email id : info@keylabstraining.com