Apache Hadoop HDFS
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Apache Hadoop HDFS

on

  • 665 views

A short presentation to describe Apache HDFS

A short presentation to describe Apache HDFS

Statistics

Views

Total Views
665
Views on SlideShare
665
Embed Views
0

Actions

Likes
0
Downloads
20
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Apache Hadoop HDFS Presentation Transcript

  • 1. Apache Hadoop HDFS ● What is it ? ● What is it for ? ● Architecture ● Resilience ● Administration ● Data access ● Future changes ?
  • 2. HDFS – What is it ? ● HDSF = Hadoop Distributed File System ● It is a distributed file system ● Runs on low cost hardware ● It is open source ● Written in Java ● Fault tolerant ● Designed for very large data sets ● Tuned for high throughput
  • 3. HDFS – What is it for ? ● Designed for batch processing ● Streaming access to data ● Large data sizes i.e. Terabytes ● Highly reliable using data replication ● Supports very large node clusters ● Supports large files ● Supports file numbers into millions
  • 4. HDFS – Architecture
  • 5. HDFS – Architecture ● Has a master / slave architecture ● A master NameNode – Controls file system operations – Maps data blocks to DataNodes – Logs all changes ● Slave DataNodes – Store file blocks – Store replicated data
  • 6. HDFS – Resilience ● Data is replicated across DataNodes ● Nodes may fail but data is still available ● DataNodes indicate state via heart beat report ● Single point of failure in master NameNode ● Data integrity via check sums
  • 7. HDFS – Administration ● Access via Java API ● FS Shell commands language ● HTTP browser ● C wrapper for Java API ● Space reclamation – Via control of replication factor – Deleted files sent to trash folder – Trash folder cleaned after configurable time
  • 8. HDFS – Future changes Things they might consider for HDFS ● File append ● User quotas ● File links ● Stand by nodes
  • 9. Other Areas ● Want to know about ? – Big Data – Nutch – Solr ● see my other presentations
  • 10. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems