Developed by Doug Cutting, Mika Cafarella and Team
Open Source Project that works on MapReduce algorithm
Apache Hadoop is a registered trademark of Apache Software Foundation
2. Big Data- Hadoop
90% of world’s data is generated in the last few years
Big Data: Large dataset the cannot be processed using the
traditional computing techniques.
What comes under Big Data:
• Social Media Data
• Stock exchange Data
• Search engine data
4. HADOOP
• Developed by Doug Cutting, Mika Cafarella and Team
• Open Source Project that works on MapReduce algorithm
• Apache Hadoop is a registered trademark of Apache
Software Foundation
5. HADOOP Framework
• Hadoop Common: Java Libraries
• Hadoop YARN: Job Scheduling and
cluster management framework
•Hadoop HDFS: Distributed File
System that provides high-
throughput access to application
data
MapReduce: Software framework
for parallel processing of large data
sets
6. How Does HADOOP Work?
Stage 1
User submit a job to the Hadoop Job-Client
for required process by specifying :
• the input and output files location in DFS
• Job configuration by setting different
parameters specific to the jobStage 2
• The Hadoop job client then submits the
job and configuration to JobTracker
• JobTracker distributes the configuration
to the slaves, scheduling tasks and
monitoring them, providing status to job-
client.
7. How Does HADOOP Work?
Stage 3
TaskTracker executes the task as per
MapReduce implementation and output is
stored into output files on the file system.