0
Introduction of Apache
Hadoop
Presenter: Prem Chand Mali, Mindfire Solutions
Date: 30/01/2014
About Me
SCJP/OCJP - Oracle Certified Java Programmer
MCP:70-480 - Specialist certification in HTML5
with JavaScript and C...
Agenda
History
What is Apache Hadoop
Why Apache Hadoop
HDFS
MapReduce
Q&A

Presenter: Prem Chand Mali, Mindfire Solutions
History
• Nutch Crawler based search
• GFS and Map Reduce paper published.
• Yahoo! hired Doug Cutting and given dedicated...
What is Apache Hadoop ?
• Apache Hadoop is an open-source software framework that supports dataintensive distributed appli...
What is Apache Hadoop ?
• The Apache Hadoop framework is composed of the following modules :
– Hadoop Distributed File Sys...
Why Apache Hadoop ?
• State of Data
– 90% of data in past three years.
– Type of data
• Unstructured
• Semi-structured
• R...
HDFS
• HDFS is the primary distributed storage used by Hadoop applications. It consist of
following two type of components...
HDFS

Presenter: Prem Chand Mali, Mindfire Solutions
MapReduce
• MapReduce if combination of following three things.
– Map
– Shuffle
– Reduce
• It done it's job through Job Tr...
MapReduce

Presenter: Prem Chand Mali, Mindfire Solutions
MapReduce

Presenter: Prem Chand Mali, Mindfire Solutions
MapReduce

Presenter: Prem Chand Mali, Mindfire Solutions
Question and
Answer

Presenter: Prem Chand Mali, Mindfire Solutions
Thank you

Presenter: Prem Chand Mali, Mindfire Solutions
www.mindfiresolutions.com
https://www.facebook.com/MindfireSolutions
http://www.linkedin.com/company/mindfire-solutions
ht...
Upcoming SlideShare
Loading in...5
×

An Introduction to Apache Hadoop

263

Published on

Apache Hadoop is a framework for running applications on large cluster built of commodity hardware.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
263
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "An Introduction to Apache Hadoop"

  1. 1. Introduction of Apache Hadoop Presenter: Prem Chand Mali, Mindfire Solutions Date: 30/01/2014
  2. 2. About Me SCJP/OCJP - Oracle Certified Java Programmer MCP:70-480 - Specialist certification in HTML5 with JavaScript and CSS3 Exam Skills : Java, Swings, Springs, Hibernate, JavaFX, Jquery, prototypeJS, ExtJS. Connect Me : https://www.facebook.com/prem.c.mali http://www.linkedin.com/in/premmali https://twitter.com/prem_mali https://plus.google.com/106150245941317924019/about/p/pub Contact Me : premchandm@mindfiresolutions.com / prem.c.mali@gmail.com mfsi_premchandm Presenter: Prem Chand Mali, Mindfire Solutions
  3. 3. Agenda History What is Apache Hadoop Why Apache Hadoop HDFS MapReduce Q&A Presenter: Prem Chand Mali, Mindfire Solutions
  4. 4. History • Nutch Crawler based search • GFS and Map Reduce paper published. • Yahoo! hired Doug Cutting and given dedicated team. Presenter: Prem Chand Mali, Mindfire Solutions
  5. 5. What is Apache Hadoop ? • Apache Hadoop is an open-source software framework that supports dataintensive distributed applications licensed under the Apache v2 license. It supports running applications on large clusters of commodity hardware. • Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. • Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers. Presenter: Prem Chand Mali, Mindfire Solutions
  6. 6. What is Apache Hadoop ? • The Apache Hadoop framework is composed of the following modules : – Hadoop Distributed File System (HDFS) - a distributed file-system that stores data on the commodity machines, providing very high aggregate bandwidth across the cluster. – Hadoop MapReduce - a programming model for large scale data processing. – Hadoop Common - contains libraries and utilities needed by other Hadoop modules – Hadoop YARN - a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. Presenter: Prem Chand Mali, Mindfire Solutions
  7. 7. Why Apache Hadoop ? • State of Data – 90% of data in past three years. – Type of data • Unstructured • Semi-structured • Relational – Relation world can handle GB of data. • Distributed • Scalable • Flexible • Fault tolerant • Intelligent Presenter: Prem Chand Mali, Mindfire Solutions
  8. 8. HDFS • HDFS is the primary distributed storage used by Hadoop applications. It consist of following two type of components. – NameNode – DataNode • HDFS, is well suited for distributed storage and distributed processing using commodity hardware. • Hadoop supports shell-like commands to interact with HDFS directly. Presenter: Prem Chand Mali, Mindfire Solutions
  9. 9. HDFS Presenter: Prem Chand Mali, Mindfire Solutions
  10. 10. MapReduce • MapReduce if combination of following three things. – Map – Shuffle – Reduce • It done it's job through Job Tracker and Task Tracker Presenter: Prem Chand Mali, Mindfire Solutions
  11. 11. MapReduce Presenter: Prem Chand Mali, Mindfire Solutions
  12. 12. MapReduce Presenter: Prem Chand Mali, Mindfire Solutions
  13. 13. MapReduce Presenter: Prem Chand Mali, Mindfire Solutions
  14. 14. Question and Answer Presenter: Prem Chand Mali, Mindfire Solutions
  15. 15. Thank you Presenter: Prem Chand Mali, Mindfire Solutions
  16. 16. www.mindfiresolutions.com https://www.facebook.com/MindfireSolutions http://www.linkedin.com/company/mindfire-solutions http://twitter.com/mindfires Presenter: Prem Chand Mali, Mindfire Solutions
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×