Your SlideShare is downloading. ×
0
An  Introduction to Apache Hadoop
An  Introduction to Apache Hadoop
An  Introduction to Apache Hadoop
An  Introduction to Apache Hadoop
An  Introduction to Apache Hadoop
An  Introduction to Apache Hadoop
An  Introduction to Apache Hadoop
An  Introduction to Apache Hadoop
An  Introduction to Apache Hadoop
An  Introduction to Apache Hadoop
An  Introduction to Apache Hadoop
An  Introduction to Apache Hadoop
An  Introduction to Apache Hadoop
An  Introduction to Apache Hadoop
An  Introduction to Apache Hadoop
An  Introduction to Apache Hadoop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

An Introduction to Apache Hadoop

254

Published on

Apache Hadoop is a framework for running applications on large cluster built of commodity hardware.

Apache Hadoop is a framework for running applications on large cluster built of commodity hardware.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
254
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Introduction of Apache Hadoop Presenter: Prem Chand Mali, Mindfire Solutions Date: 30/01/2014
  • 2. About Me SCJP/OCJP - Oracle Certified Java Programmer MCP:70-480 - Specialist certification in HTML5 with JavaScript and CSS3 Exam Skills : Java, Swings, Springs, Hibernate, JavaFX, Jquery, prototypeJS, ExtJS. Connect Me : https://www.facebook.com/prem.c.mali http://www.linkedin.com/in/premmali https://twitter.com/prem_mali https://plus.google.com/106150245941317924019/about/p/pub Contact Me : premchandm@mindfiresolutions.com / prem.c.mali@gmail.com mfsi_premchandm Presenter: Prem Chand Mali, Mindfire Solutions
  • 3. Agenda History What is Apache Hadoop Why Apache Hadoop HDFS MapReduce Q&A Presenter: Prem Chand Mali, Mindfire Solutions
  • 4. History • Nutch Crawler based search • GFS and Map Reduce paper published. • Yahoo! hired Doug Cutting and given dedicated team. Presenter: Prem Chand Mali, Mindfire Solutions
  • 5. What is Apache Hadoop ? • Apache Hadoop is an open-source software framework that supports dataintensive distributed applications licensed under the Apache v2 license. It supports running applications on large clusters of commodity hardware. • Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. • Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers. Presenter: Prem Chand Mali, Mindfire Solutions
  • 6. What is Apache Hadoop ? • The Apache Hadoop framework is composed of the following modules : – Hadoop Distributed File System (HDFS) - a distributed file-system that stores data on the commodity machines, providing very high aggregate bandwidth across the cluster. – Hadoop MapReduce - a programming model for large scale data processing. – Hadoop Common - contains libraries and utilities needed by other Hadoop modules – Hadoop YARN - a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. Presenter: Prem Chand Mali, Mindfire Solutions
  • 7. Why Apache Hadoop ? • State of Data – 90% of data in past three years. – Type of data • Unstructured • Semi-structured • Relational – Relation world can handle GB of data. • Distributed • Scalable • Flexible • Fault tolerant • Intelligent Presenter: Prem Chand Mali, Mindfire Solutions
  • 8. HDFS • HDFS is the primary distributed storage used by Hadoop applications. It consist of following two type of components. – NameNode – DataNode • HDFS, is well suited for distributed storage and distributed processing using commodity hardware. • Hadoop supports shell-like commands to interact with HDFS directly. Presenter: Prem Chand Mali, Mindfire Solutions
  • 9. HDFS Presenter: Prem Chand Mali, Mindfire Solutions
  • 10. MapReduce • MapReduce if combination of following three things. – Map – Shuffle – Reduce • It done it's job through Job Tracker and Task Tracker Presenter: Prem Chand Mali, Mindfire Solutions
  • 11. MapReduce Presenter: Prem Chand Mali, Mindfire Solutions
  • 12. MapReduce Presenter: Prem Chand Mali, Mindfire Solutions
  • 13. MapReduce Presenter: Prem Chand Mali, Mindfire Solutions
  • 14. Question and Answer Presenter: Prem Chand Mali, Mindfire Solutions
  • 15. Thank you Presenter: Prem Chand Mali, Mindfire Solutions
  • 16. www.mindfiresolutions.com https://www.facebook.com/MindfireSolutions http://www.linkedin.com/company/mindfire-solutions http://twitter.com/mindfires Presenter: Prem Chand Mali, Mindfire Solutions

×