Hadoop & HDFS
"       version 1.0File & Content Solutions!
What is Hadoop!§  Built and distributed as part of the Apache Software    Project;
    "§  Hadoop EcoSystem:"   §  Comm...
Common Terms in Hadoop HDFS!§  Name node - manages the File System namespace. It    maintains the File System tree and th...
Common Terms in Hadoop HDFS!§  Secondary Name node - Its main role is to periodically    merge the namespace image with t...
Hadoop Distributed File System - HDFS!§  HDFS is a File System designed for storing very large    files with streaming dat...
Writing data into Hadoop!                            File & Content Solutions!
Reading data from HDFS!                          File & Content Solutions!
MapReduce!§  "Map" step: The master node takes the input, divides it    into smaller sub-problems, and distributes them t...
MapReduce!             File & Content Solutions!
HDFS Storage Solution!§  The DataLogix Hadoop Storage Solution contains:"   §  Enterprise Scale-Out storage solution for...
Writing into Hadoop with the DataLogix solution!§  The storage system becomes the Name Node and as well as the Data    No...
Reading Hadoop Data !§  Data is read off the cluster back to the compute nodes;
    "§  The Data Nodes are now compute n...
More information?!!§  More information about the Hadoop storage solutions?
    
      Please contact us:
      
        D...
Upcoming SlideShare
Loading in …5
×

DataLogix Hadoop Solution

554 views
506 views

Published on

DataLogix Hadoop Storage Solution

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
554
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

DataLogix Hadoop Solution

  1. 1. Hadoop & HDFS
" version 1.0File & Content Solutions!
  2. 2. What is Hadoop!§  Built and distributed as part of the Apache Software Project;
 "§  Hadoop EcoSystem:" §  Common – set of components and interfaces for a DFS and general I/O;" §  Avro – A serialization system for efficient, cross language RPC, and persistent data storage;" §  MapReduce – A distributed data processing model and execution environment that runs on large clusters of commodity machines;" §  HDFS – A distributed File System that runs on large clusters of commodity hardware." File & Content Solutions!
  3. 3. Common Terms in Hadoop HDFS!§  Name node - manages the File System namespace. It maintains the File System tree and the metadata for all the files and directories in the tree. 
 
 This information is stored persistently on the local disk in the form of two files: the namespace image and the edit log.
 "§  Data node- Workhorses of the File System. They store and retrieve blocks when they are told to (by clients or the name node), and they report back to the name node periodically with lists of blocks that they are storing." File & Content Solutions!
  4. 4. Common Terms in Hadoop HDFS!§  Secondary Name node - Its main role is to periodically merge the namespace image with the edit log to prevent the edit log from becoming too large. The secondary name node usually runs on a separate physical machine
 " File & Content Solutions!
  5. 5. Hadoop Distributed File System - HDFS!§  HDFS is a File System designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware. 
 "§  HDFS has a permissions model for files and directories that is much like POSIX." POSIX is an acronym for Portable Operating System Interface." File & Content Solutions!
  6. 6. Writing data into Hadoop! File & Content Solutions!
  7. 7. Reading data from HDFS! File & Content Solutions!
  8. 8. MapReduce!§  "Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node.
 "§  "Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve."" File & Content Solutions!
  9. 9. MapReduce! File & Content Solutions!
  10. 10. HDFS Storage Solution!§  The DataLogix Hadoop Storage Solution contains:" §  Enterprise Scale-Out storage solution for Hadoop workflows.
 " §  Native connectivity for Hadoop and Eco-systems components:" §  Hive" §  Hbase" §  Pig" §  Mahout
 " §  No single point of failure Name Node;
 " §  No 3x mirroring, native N+M protection is used;
 " §  SnapShot, Sync and NDMP back-up is supported." File & Content Solutions!
  11. 11. Writing into Hadoop with the DataLogix solution!§  The storage system becomes the Name Node and as well as the Data Node
 "§  Provides scalability and protection of the data. 
 "§  Hadoop cluster no longer has a single point of failure and no longer writes multiple 64MB-128MB chunks of data to datanodes" File & Content Solutions!
  12. 12. Reading Hadoop Data !§  Data is read off the cluster back to the compute nodes;
 "§  The Data Nodes are now compute nodes and are independent of the data in the Hadoop cluster:" §  Benefits are that Hadoop hardware can be ugraded without the need for migration of data. " File & Content Solutions!
  13. 13. More information?!!§  More information about the Hadoop storage solutions?
 
 Please contact us:
 
 DataLogix
 Phone: +31(0)30-7440710
 e-mail: info@datalogix.nl
 
 www.datalogix.nl" File & Content Solutions!

×