Hadoop development in China Mobile Research Institute

6,067 views
5,943 views

Published on

Hadoop Development in China Mobile Research Institute, esp in HDFS

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
6,067
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
167
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Hadoop development in China Mobile Research Institute

  1. 1. 中国移动研究院的Hadoop 中国移动研究院的Hadoop 相关研发工作 中国移动研究院 王旭
  2. 2. 大云研发历程 中国移动启动“大云” “大云”研发计划,打造公司云计算核心竞争力 “大云”计划是中国移动研究院为打造中国移动云计算基础设施而实施 的关键技术研究及原型系统开发计划 目标 为满足中国移动IT支撑系统 中国移动IT支撑系统高性能、低成本、可扩展、高可靠性的IT计算 和存储的需要 为满足中国移动提供互联网业务和服务 互联网业务和服务的需要 确定大云研 用闲置资源搭 建 设 256 节 点 发布大 搭建1024 搭建 云计算大会上 究方向 建 第 一 个 分 析工 集群 和 分析 工 云 0.5 节点集群 发布大云 大云1.0 发布大云 平台 Hadoop平台 具 2007.3 2007.7 2008.3 2008.10 2008.12 2009.8 2009.12 2010.5 并行数据挖掘工具开发 与应用试验 云计算技术路标、引入策略、 云计算技术路标、引入策略、 总体解决方案研究、 总体解决方案研究、产品研 研究Hadoop等 研究 等 开源架构/关键 开源架构 关键 系统改进、 系统改进、完善与试验 发、应用试验、产业链培育、 应用试验、产业链培育、 云计算关键技术 技术研究 商务模式研究 系统评估
  3. 3. 建设1024节点规模的大规模实验室 实验室建设 • 年 月完成了大规模运算实验室一期工程的 2008年9月完成了大规模运算实验室一期工程的 建设,配置了256台PC服务器,初步建立了大 建设,配置了 服务器, 台 服务器 规模运算平台研发和试验环境 • 年 月完成了大规模实验室二期扩容工程 2009年12月完成了大规模实验室二期扩容工程 建设 实验室环境 实验室部署 • 节点: 节点:1036个服务器;5208个CPU核,10T内 个服务器; 个服务器 个 核 内 存;2.8P硬盘 硬盘 • 交换机: 个万兆 千兆兼容以太网交换机, 个万兆/千兆兼容以太网交换机 交换机:9个万兆 千兆兼容以太网交换机,树 形结构互联 • 软件: 软件:Centos Linux 5.4,kernel 2.6.18, jdk , 等 1.6, hadoop-0.20等 • 部署的应用:数据挖掘,弹性计算平台 部署的应用:数据挖掘,弹性计算平台BC-EC, , 结构化海量数据管理平台HugeTable,搜索引 结构化海量数据管理平台 , 擎,云存储
  4. 4. 中国移动大云技术架构 CMCC IT Supporting Systems Internet App IDC 。。。 Application • IT Supporting System of CMCC • IDC and Internet Applications Cloud Storage Data Mining Search Engine System Management :CloudMaster System Management :CloudMaster BC-NAS BC-PDM BC-SE Enabler • BC-PDM: Cloud base Data Mining • BC-NAS: File and Object Storage with Sturcture Data Storage web Interface and REST API HugeTable • BC-SE: Search Engine CloudSecurity CloudSecurity Hadoop MapReduce with CMRI Extension Platform • MapReduce & HDFS: based on Hadoop Object Storage Distributed Filesystem and with some extensions by CMRI based on oNest Hadoop HDFS : • HugeTable:Structure Storage with SQL interface Elastic Computing: BC-EC : •oNest:Object Storage for Web Apps. Linux, Xen/KVM • CloudMaster: System Management Resource • PC Server and SATA Disk based • BC-EC: IaaS based on OpenNebula • Based on FOSS: Linux, KVM, Xen
  5. 5. 大云与Hadoop CMCC IT Supporting Systems Internet App IDC 。。。 Development based on Hadoop Parallel ETL and Data Mining Cloud Storage Data Mining Search Engine based on MapReduce System Management : System Management :CloudMaster BC-NAS BC-PDM BC-SE Search Engine based on MapReduce HugeTable (Structure data Sturcture Data Storage HugeTable storage for data warehouse) based on Hive, HBase & MR CloudSecurity CloudSecurity Hadoop MapReduce with CMRI Extension Development extending Hadoop Object Storage Distributed Filesystem Volume Management of based on DataNode in HDFS oNest Hadoop HDFS NameNode Cluster for HDFS Elastic Computing: BC-EC Multi-queue scheduler with Linux, Xen/KVM queue priority enhancement External facilities for Hadoop Test tools for Hadoop HDFS Inside job performance evaluation tool MapReduce Job Submission Web Interface
  6. 6. Development on Hadoop in CMRI Contributing to Mainline Online Volume Management of DataNode (by Wang Xu etc., HDFS- 1362) Off-Tree and Opened NameNode Cluster for HA (by Wang Xu, hosted in GitHub) Off-Tree and not Maintained hdfs-fuse (by Zhao Peng, hosted in Google Code) Multi-queue scheduler with queue priority enhancement (by Guo Leitao) External Facilities hadoop-test (by Wang Xu, hosted in Google Code) MapReduce Job Submission Web Interface and Inside job performance evaluation tool (by Guo Leitao, etc.) Bug Fixes
  7. 7. DataNode Online Volume Management http://github.com/gnawux/hadoop-cmri https://issues.apache.org/jira/browse/HDFS-1362 Current State: Disk failure Node Decommission Online Volume Management: Online removal of failed disk Migrate Data in faild volume if still readable Change Disk online
  8. 8. NameNode Cluster http://github.com/gnawux/hadoop-cmri [code] http://gnawux.info/hadoop/2010/01/pratice-of-namenode-cluster-for-hdfs-ha/ http://gnawux.info/hadoop/2010/05/namenode-cluster-code-github/
  9. 9. HDFS Stress Test http://code.google.com/p/hadoop-test/ http://gnawux.info/hadoop/2010/01/a-simple-hdfs-performance-test-tool/
  10. 10. 谢谢关注 http://labs.chinamobile.com/cloud/

×