Your SlideShare is downloading. ×
20100130 hardoop apache
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

20100130 hardoop apache

2,686
views

Published on

Published in: Technology

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,686
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
115
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hadoop and HDFS in CMRI China Mobile Research Institute WANG, Xu [wangxu(at)chinamobile.com]
  • 2. Apache Hadoop http://hadoop.apache.org/ Open source clone of Google infrastructure De facto standards of MapReduce framework, win Terasort several times Search Engine, Data Mining, Log Analyzing Clusters scale up to 4,000 nodes Yahoo!, Facebook, Cloudera Baidu, Alibaba, China Mobile 内部资料 注意保密
  • 3. Hadoop in China 2009 Beijing Nov 15, 2009 内部资料 注意保密
  • 4. Subprojects of Hadoop Data K-V K- Store / Distributed Warehouse Column based Lock DB HBase ZooKeeper Pig Hive Basic (BigTable) (Chubby) Platform Hadoop MapReduce (Google MapReduce) Core HDFS (Google GFS) Serialized Data Format Hadoop Common Avro & (io, ipc….) (ipc) RPC JVM 内部资料 注意保密
  • 5. HDFS Principles Follow Google GFS Paper For Big data storage and processing Write once, read frequently Modify is not permitted, append will be support soon Read is prior to writing Working on commodity PC Hardware may fail anytime Multiple replicas for data safety 内部资料 注意保密
  • 6. HDFS Architecture 内部资料 注意保密
  • 7. Data in HDFS NameNode’s Memory Namespace Info FS Hierarchical Tree Map(file, blocks) DataNode Map Map(living datanode, blocks) Blocks Map Map(block, file/datanodes) Other runtime info Lock holding by clients Blocks being processed (replication, invalid…) 内部资料 注意保密
  • 8. Persistence of NameNode data NameNode persistence Namespace: FSImage & EditLog Starting & Shutdown Secondary NameNode Checkpoint (merge EditLog into FSImage) Periodically work (1 hour by default) Backup NameNode Introduced In 0.21 (not release yet) “Real time Secondary NameNode” or Remote Editlog DataNode Map and other Info only exists in NameNode Memory 内部资料 注意保密
  • 9. High Availability Considerations Availability in Mainstream SPOF in NameNode, Fail of NameNode may cause Service interruption for minutes Data loss for a ckpt period (worst case) Possible Solution: DRBD+Linux-HA Mature fail over mechanism Service interruption for minutes Almost no data loss Another Solution: NameNode Cluster Extension Service continuous Almost no data loss Modify the code Consistency vs. Performance 内部资料 注意保密
  • 10. HDFS+NNC Architecture 内部资料 注意保密
  • 11. NNC Design Master & Slave: 1:N Master synchronize the FSNamesystem to slaves Zookeeper works as a registry, client and datanode can lookup namenode list from it. DFSClient can access multiple namenode for reading operation Failover is controlled by linux- HA by far, which get namenode status info from ClientProtocol 内部资料 注意保密
  • 12. Update Events NNU_NOP // nothing to do NNU_BLK // add or remove a block NNU_INODE // add or remove or modify an inode (add or remove file; new block allocation) NNU_NEWFILE // start new file NNU_CLSFILE // close new file NNU_MVRM // move or remove file NNU_MKDIR // mkdir NNU_LEASE // add/update or release a lease NNU_LEASE_BATCH //update batch of leases NNU_DNODEHB_BATCH //batch of datanode heartbeat NNU_DNODEREG // dnode register NNU_DNODEBLK // block report NNU_DNODERM // remove dnode NNU_BLKRECV // block received message from datanode NNU_REPLICAMON //replication monitor work NNU_WORLD //bootstrap a slave node NNU_MASSIVE //bootstrap a slave node 内部资料 注意保密
  • 13. Performance and Other Issues The overhead of NameNode synchronization For typical file IO and MapReduce (sort, wordcount) NNC system reaches 95% performance of hadoop without NNC For meta data write only operation (parallel touchz or mkdir) NNC system reaches 15% performance of hadoop without NNC Performance gaining of Multiple NameNode in read-only operation Cannot observed till now, unfortunately Other design issue Why from master to slaves directly without an additional delivery node? That may introduce another SPOF, and make the problem more complex. Why don’t use Zookeeper for failover? Linux-HA works well, and we are also evaluate whether change to ZK, any suggestions? 内部资料 注意保密
  • 14. Q&A