Hadoop
    Hadoop         (HDFS)



     




Public 2009/5/13
• Hadoop
• Hadoop   (HDFS)
    –
    –
    –
•




                    Copyright 2009 - Trend Micro Inc.
Hadoop               ?

• Hadoop

• Apache top-level                                  Cloud Applications

• Hadoop
    –  ...
Hadoop

• 2003   2
  – Google           MapReduce
• 2003   10
  – Google     Goofle File System (GFS)
• 2004   12
  – Goog...
Hadoop

• 2007   2
  – Mike Cafarella        Hbase
• 2007   4
  – Yahoo!    1000                Hadoop
• 2008   1
  – Hado...
Who use Hadoop?
•   Yahoo!
    – Hadoop          2              CPU        10
•   Google
    –                 Hadoop
•   ...
Hadoop   (HDFS)




              Copyright 2009 - Trend Micro Inc.
HDFS

•                                                 (Single
    Namespace)
•
    – 1          1             10 Peta By...
HDFS

•
    –

•       (File replication)
    –                3   .
    –
•
    –
    –
•
    –                         (...
Copyright 2009 - Trend Micro Inc.
Copyright 2009 - Trend Micro Inc.
(NameNode)

• NameNode           HDFS                (File System
  Namespace)
   –                  (blocks)
   –        ...
NameNode                              (Metadata)

•   Name node         Metadata

     –         Metadata

     –

•   Met...
NameNode                             (Metadata)
•             (      EditLog)
    –

•   FsImage
    – Name Node

        ...
(Secondary NameNode)

•    NameNode        FsImage     EditLog        NameNode

•    FSImage   EditLog                    ...
NameNode

•   NameNode          SPOF (single point of failure)
•              (High Availablity)


               SPOF!!

...
(DataNode)

•                    (Blocks)

    –                     (     ext3)

    –        block   metadata
        • ...
HDFS –                     (Replication)

•             3
•                                 (block size)
    (replication ...
Block Placement

• Policy (v0.19)
    –
    –
    –
    –
•




                   Copyright 2009 - Trend Micro Inc.
Heartbeats

• DataNode   Heartbeats    NameNode
   –   3
• NameNode    Heartbeats      DataNode




                      ...
(Data Correctness)

•       Checksum
    – Cyclic Redundancy Check (CRC32 )
•
    –          512                Checksum
 ...
(User Interface)

•   API
     – Java API
     – C language wrapper for the Java API is also avaiable

•   POSIX like comm...
Web




      Copyright 2009 - Trend Micro Inc.
Web
  (http://172.16.203.136:50070)




Classification                    Copyright 2009 - Trend Micro Inc.
POSIX Like command




                     Copyright 2009 - Trend Micro Inc.
Java API




           Copyright 2009 - Trend Micro Inc.
POSIX Like command




                     Copyright 2009 - Trend Micro Inc.
• Hadoop document and installation
   – http://hadoop.apache.org/
• Hadoop Wiki
   – http://wiki.apache.org/hadoop/
• Goog...
Upcoming SlideShare
Loading in...5
×

Zh Tw Introduction To Hadoop And Hdfs

5,105

Published on

http://www.trend.org

Published in: Technology, Education
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,105
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
273
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

Zh Tw Introduction To Hadoop And Hdfs

  1. 1. Hadoop Hadoop (HDFS)  Public 2009/5/13
  2. 2. • Hadoop • Hadoop (HDFS) – – – • Copyright 2009 - Trend Micro Inc.
  3. 3. Hadoop ? • Hadoop • Apache top-level Cloud Applications • Hadoop – (HDFS) MapReduce HBase – MapReduce • Java Hadoop Distributed File System (HDFS) • C++/Java/Shell/ Command… A Cluster of Machines • – Linux Mac OS/X Windows Solaris – Copyright 2009 - Trend Micro Inc.
  4. 4. Hadoop • 2003 2 – Google MapReduce • 2003 10 – Google Goofle File System (GFS) • 2004 12 – Google MapReduce • 2005 7 – Doug Cutting Nutch MapReduce • 2006 2 – Hadoop Nutch Lucene • 2006 11 – Google Bigtable Copyright 2009 - Trend Micro Inc.
  5. 5. Hadoop • 2007 2 – Mike Cafarella Hbase • 2007 4 – Yahoo! 1000 Hadoop • 2008 1 – Hadoop Apache Copyright 2009 - Trend Micro Inc.
  6. 6. Who use Hadoop? • Yahoo! – Hadoop 2 CPU 10 • Google – Hadoop • Amazon – Amazon Hadoop – • IBM – Blue Cloud • Trend Micro – Hadoop • Hadoop … – http://wiki.apache.org/hadoop/PoweredBy Copyright 2009 - Trend Micro Inc.
  7. 7. Hadoop (HDFS) Copyright 2009 - Trend Micro Inc.
  8. 8. HDFS • (Single Namespace) • – 1 1 10 Peta Bytes • – Write-once-read-many – • (block) – 128 MB – (replica) (DataNode) Copyright 2009 - Trend Micro Inc.
  9. 9. HDFS • – • (File replication) – 3 . – • – – • – (low latency) – (Batch processing) Copyright 2009 - Trend Micro Inc.
  10. 10. Copyright 2009 - Trend Micro Inc.
  11. 11. Copyright 2009 - Trend Micro Inc.
  12. 12. (NameNode) • NameNode HDFS (File System Namespace) – (blocks) – (block) Data Node • Hadoop cluster • Copyright 2009 - Trend Micro Inc.
  13. 13. NameNode (Metadata) • Name node Metadata – Metadata – • Metadata – (files) – (blocks) – (block) (Data Node) – • : (creation time), (replication factor) Copyright 2009 - Trend Micro Inc.
  14. 14. NameNode (Metadata) • ( EditLog) – • FsImage – Name Node • (Name Space) • (Block) (File) • – NameNode FsImage EditLog • Checkpoint – NameNode – FsImange EditLog EditLog FsImange Copyright 2009 - Trend Micro Inc.
  15. 15. (Secondary NameNode) • NameNode FsImage EditLog NameNode • FSImage EditLog FSImage • FSImage NameNode – NameNode EditLog • Secondary NameNode NameNode (Fail over) – Hadoop Name Node FsImage FsImage (new) EditLog Copyright 2009 - Trend Micro Inc.
  16. 16. NameNode • NameNode SPOF (single point of failure) • (High Availablity) SPOF!! Copyright 2009 - Trend Micro Inc.
  17. 17. (DataNode) • (Blocks) – ( ext3) – block metadata • (CRC), block – • Block – Blocks NameNode – NameNode block NameNode block Copyright 2009 - Trend Micro Inc.
  18. 18. HDFS – (Replication) • 3 • (block size) (replication factor) • (rack- aware) . Copyright 2009 - Trend Micro Inc.
  19. 19. Block Placement • Policy (v0.19) – – – – • Copyright 2009 - Trend Micro Inc.
  20. 20. Heartbeats • DataNode Heartbeats NameNode – 3 • NameNode Heartbeats DataNode Copyright 2009 - Trend Micro Inc.
  21. 21. (Data Correctness) • Checksum – Cyclic Redundancy Check (CRC32 ) • – 512 Checksum – DataNode Checksum • – Checksum – Copyright 2009 - Trend Micro Inc.
  22. 22. (User Interface) • API – Java API – C language wrapper for the Java API is also avaiable • POSIX like command – hadoop dfs -mkdir /foodir – hadoop dfs -cat /foodir/myfile.txt – hadoop dfs -rm /foodir myfile.txthadoop dfs -rm /foodir myfile.txt • DFSAdmin – bin/hadoop dfsadmin –safemode – bin/hadoop dfsadmin –report – bin/hadoop dfsadmin -refreshNodes • Web – http://host:port/dfshealth.jsp Copyright 2009 - Trend Micro Inc.
  23. 23. Web Copyright 2009 - Trend Micro Inc.
  24. 24. Web (http://172.16.203.136:50070) Classification Copyright 2009 - Trend Micro Inc.
  25. 25. POSIX Like command Copyright 2009 - Trend Micro Inc.
  26. 26. Java API Copyright 2009 - Trend Micro Inc.
  27. 27. POSIX Like command Copyright 2009 - Trend Micro Inc.
  28. 28. • Hadoop document and installation – http://hadoop.apache.org/ • Hadoop Wiki – http://wiki.apache.org/hadoop/ • Google File System Paper – http://labs.google.com/papers/gfs.html Copyright 2009 - Trend Micro Inc.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×