15. HDFS :数据本地化 Data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Results Data data data data Data data data data Data data data data Data data data data Data data data data Data data data data Data data data data Data data data data Data data data data Hadoop Cluster Block 1 Block 1 Block 2 Block 2 Block 2 Block 1 MAP MAP MAP Reduce Block 3 Block 3 Block 3
按照当前各公司公布的数据来看,百度日处理规模居全球主要互联网公司第 2 名,仅次于 Google 的每日 30PB 左右的输入数据处理量。
– Chooses new DataNodes for new replicas – Balances disk usage – Balances communication traffic to DataNodes
Block (Object) Storage Subsystem Shared storage provided as pools of blocks Namespaces (HDFS, others) use one or more block-pools Note: HDFS has 2 layers today – we are generalizing/extending it.