An  Introduction to Apache Hadoop
Upcoming SlideShare
Loading in...5
×
 

An Introduction to Apache Hadoop

on

  • 392 views

Apache Hadoop is a framework for running applications on large cluster built of commodity hardware.

Apache Hadoop is a framework for running applications on large cluster built of commodity hardware.

Statistics

Views

Total Views
392
Views on SlideShare
392
Embed Views
0

Actions

Likes
0
Downloads
13
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

An  Introduction to Apache Hadoop An Introduction to Apache Hadoop Presentation Transcript

  • Introduction of Apache Hadoop Presenter: Prem Chand Mali, Mindfire Solutions Date: 30/01/2014
  • About Me SCJP/OCJP - Oracle Certified Java Programmer MCP:70-480 - Specialist certification in HTML5 with JavaScript and CSS3 Exam Skills : Java, Swings, Springs, Hibernate, JavaFX, Jquery, prototypeJS, ExtJS. Connect Me : https://www.facebook.com/prem.c.mali http://www.linkedin.com/in/premmali https://twitter.com/prem_mali https://plus.google.com/106150245941317924019/about/p/pub Contact Me : premchandm@mindfiresolutions.com / prem.c.mali@gmail.com mfsi_premchandm Presenter: Prem Chand Mali, Mindfire Solutions
  • Agenda History What is Apache Hadoop Why Apache Hadoop HDFS MapReduce Q&A Presenter: Prem Chand Mali, Mindfire Solutions
  • History • Nutch Crawler based search • GFS and Map Reduce paper published. • Yahoo! hired Doug Cutting and given dedicated team. Presenter: Prem Chand Mali, Mindfire Solutions
  • What is Apache Hadoop ? • Apache Hadoop is an open-source software framework that supports dataintensive distributed applications licensed under the Apache v2 license. It supports running applications on large clusters of commodity hardware. • Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. • Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers. Presenter: Prem Chand Mali, Mindfire Solutions
  • What is Apache Hadoop ? • The Apache Hadoop framework is composed of the following modules : – Hadoop Distributed File System (HDFS) - a distributed file-system that stores data on the commodity machines, providing very high aggregate bandwidth across the cluster. – Hadoop MapReduce - a programming model for large scale data processing. – Hadoop Common - contains libraries and utilities needed by other Hadoop modules – Hadoop YARN - a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. Presenter: Prem Chand Mali, Mindfire Solutions
  • Why Apache Hadoop ? • State of Data – 90% of data in past three years. – Type of data • Unstructured • Semi-structured • Relational – Relation world can handle GB of data. • Distributed • Scalable • Flexible • Fault tolerant • Intelligent Presenter: Prem Chand Mali, Mindfire Solutions
  • HDFS • HDFS is the primary distributed storage used by Hadoop applications. It consist of following two type of components. – NameNode – DataNode • HDFS, is well suited for distributed storage and distributed processing using commodity hardware. • Hadoop supports shell-like commands to interact with HDFS directly. Presenter: Prem Chand Mali, Mindfire Solutions
  • HDFS Presenter: Prem Chand Mali, Mindfire Solutions
  • MapReduce • MapReduce if combination of following three things. – Map – Shuffle – Reduce • It done it's job through Job Tracker and Task Tracker Presenter: Prem Chand Mali, Mindfire Solutions
  • MapReduce Presenter: Prem Chand Mali, Mindfire Solutions
  • MapReduce Presenter: Prem Chand Mali, Mindfire Solutions
  • MapReduce Presenter: Prem Chand Mali, Mindfire Solutions
  • Question and Answer Presenter: Prem Chand Mali, Mindfire Solutions
  • Thank you Presenter: Prem Chand Mali, Mindfire Solutions
  • www.mindfiresolutions.com https://www.facebook.com/MindfireSolutions http://www.linkedin.com/company/mindfire-solutions http://twitter.com/mindfires Presenter: Prem Chand Mali, Mindfire Solutions