SlideShare a Scribd company logo
1 of 28
Download to read offline
Hadoop大数据实践经验

            2012年12月
Schubert Zhang, Stephen Xie, Clay Jiang
汉播与开源(Hadoop Ecosystem)
 开源项目代码、补丁、问题报告:                           为开源社区提供文档、测试报告和贡献资料:
 HBASE-1778、HBASE-1818、HBASE-1841、HBASE-
                                           http://www.slideshare.net/hanborq
 1978 、HBASE-1902、HBASE-3943、HBASE-1296、
                                           http://www.slideshare.net/schubertzhang
 MAPREDUCE-3685 、MAPREDUCE-4039、
                                           http://hbase.apache.org/book.html#regions.arch
 CASSANDRA-1729 ……


http://github.com/hanborq           汉播与开源                       http://slideshare.net/hanborq




                                                          http://slideshare.net/schubertzhang
2013/3/21                                                                                       2
Hadoop实践概述 (1)




2013/3/21                    3
Hadoop实践概述 (2)




2013/3/21                    4
集群规划 - Hadoop版本选择
•   开源发布
      –     Apache/Hortonworks
      –     Cloudera
      –     Facebook
      –     MapR
      –     Hanborq or BigCloud …
•   Hadoop, HBase, etc.




2013/3/21                              5
集群规划 – 硬件选择
•   什么是Commodity Hardware?
•   CPU:Memory:Disk,依赖于workload特性需求
•   JBOD vs. RAID vs. SPAN/BIG
•   虚拟机、Blades、SAN不适用

• Master节点                                                        • Worker(Slave)节点
     – NameNode and SecondaryNameNode
     – JobTracker                                                   – 存储+计算
     – 可靠性要求更高                                                      – 存储密集(DW)
            •   双电源
            •   捆绑网口                                                – 保留20~30%磁盘空间做MapReduce、
            •   RAID1 vs. RAID10 vs. RAID5                            HDFS、HBase临时数据
            •   专用独立的硬盘
            •   备份到NFS                                              – 每个task大概1~4GB内存,48GB内
     – 大内存                                                            存支持约10~20 tasks
            •   NameNode: 1GB 大约100万Blocks, 小文件问题
            •   JobTracker: History Task Status, Counters, etc.
                (mapred.jobtracker.completeuserjobs.maximum)
     – 小硬盘
            •   < 1TB即可,如500GB

2013/3/21                                                                                   6
集群规划 – 硬件选择 – 举例




2013/3/21                      7
集群规划 – 集群大小
• 存储和计算能力需求
      – CPU、Memory、存储、Disk IO
      – 计算作业的频率

• 按存储需求来估算 (常用)
      – 长期存储的历史数据
      – 每日增量的数据
      – 实际参与计算的数据

• 按计算能力来估算 (较难)

2013/3/21                       8
集群规划 – OS及内核调优
• 操作系统
      – RHEL/CentOS/Ubuntu …
      – JDK (http://wiki.apache.org/hadoop/HadoopJavaVersions)
             • 1.6.0_24/26/31
             • -XX:+UseCompressedOops
             • GC options
      – cron/ntp/ssh/sendmail/rsync/sysstat/dstat
      – Hostname和DNS
      – 用户,组,权限

• 内核调优
      –     ulimit: open files, max user processes, etc.
      –     vm.swappiness
      –     vm.overcommit_memory
      –     …

2013/3/21                                                        9
集群规划 – 磁盘规划
• Linux文件系统
      –     Ext3
      –     Ext4 (extent-based, sequential)
      –     Xfs (extent-based, concurrent)
      –     LVM (anti.)




2013/3/21                                     10
集群规划 – 网络规划
• 考虑HDFS/MapReduce 1Gb vs. 10Gb




            (1) 两层树结构 (中小集群)                    (2) 三层树结构 (大集群瓶颈)
                                      •   East/West vs. North/South流量模型
                                            • 写穿
                                            • Shuffle
                                      •   机架拓扑,交换机拓扑
                                      •   读:尽量本地;写:本地优先
                                      •   10Gb: low latency, high throughput
                                      •   接入交换机:10Gb的价格是1Gb的三倍
                                      •   ETL:10Gb
                                      •   Bonding ->10Gb
                                      •   CLOS Network: http://research.google.com/pubs/pub36740.html
                                          http://www.csdn.net/article/2011-05-20/298348
            (3) Spine Fabric (扩展性好)   •   Redundancy
2013/3/21                                                                                               11
集群自动化部署 - ClusterMaster
• ClusterMaster是一款自动化部署和运维系统。旨在解决大规模
  集群部署和运维过程中的几个核心问题:操作系统部署、应用部
  署及管理、配置管理、性能数据采集、报警监控。

• ClusterMaster集成Kickstart、Cobbler、Puppet、Ganglia、
  Nagios等开源软件为一体,向用户提供统一的Web管理界面。

• 依托强大的开源软件,ClusterMaster具有良好的可扩展性,可
  以应对Linux集群运维中的各种应用部署和操作系统部署需求。
  用户只需提供部署脚本和安装包,剩下的工作都将由
  ClusterMaster自动完成。

2013/3/21                                            12
ClusterMaster




2013/3/21                   13
集群自动化部署 – Hdeploy/HTCfg




HugeTable提供并行系统部署和配置工具HTCfg,可以方便地对系
统依赖软件、Linux环境配置、HugeTable的各个软件模块进行安
装部署和配置。使得用户不必一台一台地对分布式系统逐个安装
和配置。
                                      Hdeploy的deploy.dd文件片段


 2013/3/21                                                    14
运维 -
            HDFS WebUI和运维策略




2013/3/21                     15
运维 –
            MapReduce WebUI和运维策略




2013/3/21                          16
运维 – 善用Metrics




    有选择地集成Ganglia或OpenTSDB实现关键Metrics实时可视化。
2013/3/21                                     17
运维 – 监控报警可视化工具

                             ClusterMaster
                             Ganglia
                             Nagios




2013/3/21                              18
运维 – Job统计




            定期产生Job运行状况报表


2013/3/21                         19
运维 – 数据迁移(案例)

                      •   小数据中心->大数据中心
                      •   Distcp
                      •   Seed Nodes
                      •   Balance




2013/3/21                            20
Hadoop生态系统 -
                            构建完整的解决方案
                                                         User, Applications

                                                  API/QL                                  API/QL
            User                                                               Hive/Pig
             Big                               HugeTable
            Data

                                                                              MapReduce
            User
                   Flume/
             Big             Bigtable            HBase                                             Oozie
            Data    Flive        Bigtable
                    ……
                    ……

            User
             Big               file
                                                                 HDFS
                                        file      file
            Data




                                       Shared Cluster of Servers
2013/3/21                                                                                                  21
Hadoop生态系统 -
            选择合适的工具解决合适的问题




               Hive vs. Pig. vs. Java vs. Others
2013/3/21                                          22
性能评估测试 – Benchmark工具
   • 写数据操作(入库)性能测试




   • 读数据操作(查询)性能测试




2013/3/21                          23
性能评估测试 – Benchmark工具
                                                                                                              SLA核心指标
                                                                                                              •   Throughput
                                                                                                                   – tThrou :Total Throughput
                                                                                                                       (operation count)
                                                                                                                   – dThrou : Delta Throughput
                                                                                                                       (operation count)

                                                                                                              •   Latency
                                                                                                                    – tAvgLat: Total Average
     percentage of read ops                                                                                            Latency (ms)
   25.00%
                                                                                                                    – dAvgLat: Delta Average
                                                                                                                       Latency (ms)
   20.00%
   15.00%                                                                                                           – dMaxLat : Delta Maximum
   10.00%
                                                                                                                       Latency (ms)
    5.00%                                                                                                           – dMinLat : Delta Minimum
    0.00%                                                                                                              Latency (ms)
            1   3   5   7   9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61
                                                        100ms                                                 •   Quantile %

                                                                                                              •   Total : from benchmark start to
    Read Throughput: average ~140 ops/s; per-node: average ~16 ops/s                                             present.
    Latency: average ~500ms, 97% < 2s (SLA)                                                                  •   Delta: between each statistical
    Bottleneck: disk IO (random seek) (CPU load is very low)                                                     interval (2 minutes here)



2013/3/21                                                                                                                                           24
Known Issues and Work Arounds
• Hadoop系列产品还有很多已知的和未知的问题

• 这里只是举例:
      –     CentOS/RedHat Linux: transparent hugepage compaction
      –     HDFS: 正在写的文件不能读
      –     MapReduce: LZO压缩文件index过大,OOME
      –     Hive: MySQL Connection长时间连接中断导致Job失败
      –     HBase: 多CF的Memstore Flush触发机制
      –     …
2013/3/21                                                          25
Hadoop优化
•   MapReduce
•   HBase
•   Flume
•   Hive
•   Pig
•   Tools


2013/3/21                  26
Hadoop优化 @MapReduce性能
                       Job Startup Latency (seconds)                                              Job Startup Latency (seconds)
                        Total Tasks (32 maps, 4 reduces)                                           Total Tasks (96 maps, 4 reduces)
30                                                                          50                     43
                         24
                                          21                                40
20                                                                          30                                        24
                                                                            20
10
                                                                1           10                                                            1
 0                                                                           0
      CDH3u2 (Cloudera)            CDH3u2 (Cloudera)     HDH3u2 (Hanborq)         CDH3u2 (Cloudera) CDH3u2 (Cloudera)           HDH3u2 (Hanborq)
     (reuse.jvm disabled)         (reuse.jvm enabled)                            (reuse.jvm disabled) (reuse.jvm enabled)

     Sort Avoidance and Aggregation                                                                 Real Aggregation Jobs
                       2400
                                                             2186                                   700
                       2200
                       2000                                                                         600
                       1800
                       1600                                                                         500
                       1400




                                                                                 time (seconds)
                                                                                                    400
                       1200
      time (seconds)




                       1000                                                                         300
                       800                                          615
                       600                                                                          200
                       400                     216 198
                              197 175                                                               100
                       200
                         0                                                                              0
                               Case1            Case2          Case3                                        Case1-1   Case2-1   Case1-2       Case2-2
CHD3u2 (Cloudera)               197             216            2186         CDH3u2 (Cloudera)                238       603       136           206
HDH (Hanborq)                   175             198             615         HDH (Hanborq)                    233       578        96           151
                                                                                                                                                        27
Thank You Very Much!




2013/3/21                          28

More Related Content

What's hot

唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pubChao Zhu
 
Hadoop 與 SQL 的甜蜜連結
Hadoop 與 SQL 的甜蜜連結Hadoop 與 SQL 的甜蜜連結
Hadoop 與 SQL 的甜蜜連結James Chen
 
Introduction to K8S Big Data SIG
Introduction to K8S Big Data SIGIntroduction to K8S Big Data SIG
Introduction to K8S Big Data SIGJazz Yao-Tsung Wang
 
俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkabanhdhappy001
 
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Etu Solution
 
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...Etu Solution
 
Big Data Projet Management the Body of Knowledge (BDPMBOK)
Big Data Projet Management the Body of Knowledge (BDPMBOK)Big Data Projet Management the Body of Knowledge (BDPMBOK)
Big Data Projet Management the Body of Knowledge (BDPMBOK)Jazz Yao-Tsung Wang
 
Hadoop 2.0 之古往今來
Hadoop 2.0 之古往今來Hadoop 2.0 之古往今來
Hadoop 2.0 之古往今來Wei-Yu Chen
 
Azure Data Lake 簡介
Azure Data Lake 簡介Azure Data Lake 簡介
Azure Data Lake 簡介Herman Wu
 
How to plan a hadoop cluster for testing and production environment
How to plan a hadoop cluster for testing and production environmentHow to plan a hadoop cluster for testing and production environment
How to plan a hadoop cluster for testing and production environmentAnna Yen
 
准实时海量数据分析系统架构探究
准实时海量数据分析系统架构探究准实时海量数据分析系统架构探究
准实时海量数据分析系统架构探究Min Zhou
 
2014-10-17 探析台灣巨量資料產業供應鏈串聯現況
2014-10-17 探析台灣巨量資料產業供應鏈串聯現況2014-10-17 探析台灣巨量資料產業供應鏈串聯現況
2014-10-17 探析台灣巨量資料產業供應鏈串聯現況Jazz Yao-Tsung Wang
 
Hadoop Deployment Model @ OSDC.TW
Hadoop Deployment Model @ OSDC.TWHadoop Deployment Model @ OSDC.TW
Hadoop Deployment Model @ OSDC.TWJazz Yao-Tsung Wang
 
2006-11-16 RFID and OSS for Agriculture
2006-11-16 RFID and OSS for Agriculture2006-11-16 RFID and OSS for Agriculture
2006-11-16 RFID and OSS for AgricultureJazz Yao-Tsung Wang
 
大資料趨勢介紹與相關使用技術
大資料趨勢介紹與相關使用技術大資料趨勢介紹與相關使用技術
大資料趨勢介紹與相關使用技術Wei-Yu Chen
 
Hadoop 生態系十年回顧與未來展望
Hadoop 生態系十年回顧與未來展望Hadoop 生態系十年回顧與未來展望
Hadoop 生態系十年回顧與未來展望Jazz Yao-Tsung Wang
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验Hanborq Inc.
 

What's hot (20)

大數據
大數據大數據
大數據
 
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub
 
Hadoop 與 SQL 的甜蜜連結
Hadoop 與 SQL 的甜蜜連結Hadoop 與 SQL 的甜蜜連結
Hadoop 與 SQL 的甜蜜連結
 
Introduction to K8S Big Data SIG
Introduction to K8S Big Data SIGIntroduction to K8S Big Data SIG
Introduction to K8S Big Data SIG
 
俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban
 
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
 
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
 
Big Data Projet Management the Body of Knowledge (BDPMBOK)
Big Data Projet Management the Body of Knowledge (BDPMBOK)Big Data Projet Management the Body of Knowledge (BDPMBOK)
Big Data Projet Management the Body of Knowledge (BDPMBOK)
 
Hadoop 2.0 之古往今來
Hadoop 2.0 之古往今來Hadoop 2.0 之古往今來
Hadoop 2.0 之古往今來
 
Azure Data Lake 簡介
Azure Data Lake 簡介Azure Data Lake 簡介
Azure Data Lake 簡介
 
How to plan a hadoop cluster for testing and production environment
How to plan a hadoop cluster for testing and production environmentHow to plan a hadoop cluster for testing and production environment
How to plan a hadoop cluster for testing and production environment
 
准实时海量数据分析系统架构探究
准实时海量数据分析系统架构探究准实时海量数据分析系统架构探究
准实时海量数据分析系统架构探究
 
2014-10-17 探析台灣巨量資料產業供應鏈串聯現況
2014-10-17 探析台灣巨量資料產業供應鏈串聯現況2014-10-17 探析台灣巨量資料產業供應鏈串聯現況
2014-10-17 探析台灣巨量資料產業供應鏈串聯現況
 
Hadoop Deployment Model @ OSDC.TW
Hadoop Deployment Model @ OSDC.TWHadoop Deployment Model @ OSDC.TW
Hadoop Deployment Model @ OSDC.TW
 
2006-11-16 RFID and OSS for Agriculture
2006-11-16 RFID and OSS for Agriculture2006-11-16 RFID and OSS for Agriculture
2006-11-16 RFID and OSS for Agriculture
 
大資料趨勢介紹與相關使用技術
大資料趨勢介紹與相關使用技術大資料趨勢介紹與相關使用技術
大資料趨勢介紹與相關使用技術
 
Life of Big Data Technologies
Life of Big Data TechnologiesLife of Big Data Technologies
Life of Big Data Technologies
 
Hadoop 介紹 20141024
Hadoop 介紹 20141024Hadoop 介紹 20141024
Hadoop 介紹 20141024
 
Hadoop 生態系十年回顧與未來展望
Hadoop 生態系十年回顧與未來展望Hadoop 生態系十年回顧與未來展望
Hadoop 生態系十年回顧與未來展望
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 

Similar to Hadoop大数据实践经验

Qcon2013 罗李 - hadoop在阿里
Qcon2013 罗李 - hadoop在阿里Qcon2013 罗李 - hadoop在阿里
Qcon2013 罗李 - hadoop在阿里li luo
 
What could hadoop do for us
What could hadoop do for us What could hadoop do for us
What could hadoop do for us Simon Hsu
 
Apache hadoop and cdh(cloudera distribution) introduction 基本介紹
Apache hadoop and cdh(cloudera distribution) introduction 基本介紹Apache hadoop and cdh(cloudera distribution) introduction 基本介紹
Apache hadoop and cdh(cloudera distribution) introduction 基本介紹Anna Yen
 
大规模数据处理
大规模数据处理大规模数据处理
大规模数据处理Kay Yan
 
大规模数据处理
大规模数据处理大规模数据处理
大规模数据处理airsex
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionTianwei Liu
 
Zh tw introduction_to_hadoop and hdfs
Zh tw introduction_to_hadoop and hdfsZh tw introduction_to_hadoop and hdfs
Zh tw introduction_to_hadoop and hdfsTrendProgContest13
 
Hadoop的典型应用与企业化之路 for HBTC 2012
Hadoop的典型应用与企业化之路 for HBTC 2012Hadoop的典型应用与企业化之路 for HBTC 2012
Hadoop的典型应用与企业化之路 for HBTC 2012James Chen
 
Hadoop con 2015 hadoop enables enterprise data lake
Hadoop con 2015   hadoop enables enterprise data lakeHadoop con 2015   hadoop enables enterprise data lake
Hadoop con 2015 hadoop enables enterprise data lakeJames Chen
 
分布式流数据实时计算平台 Iprocess
分布式流数据实时计算平台 Iprocess分布式流数据实时计算平台 Iprocess
分布式流数据实时计算平台 Iprocessbabel_qi
 
Introduction of Spark by Wang Haihua
Introduction of Spark by Wang HaihuaIntroduction of Spark by Wang Haihua
Introduction of Spark by Wang HaihuaWang Haihua
 
淘宝分布式数据处理实践
淘宝分布式数据处理实践淘宝分布式数据处理实践
淘宝分布式数据处理实践isnull
 
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...acelyc1112009
 
百度系统部分布式系统介绍 马如悦 Sacc2010
百度系统部分布式系统介绍 马如悦 Sacc2010百度系统部分布式系统介绍 马如悦 Sacc2010
百度系统部分布式系统介绍 马如悦 Sacc2010Chuanying Du
 
新时代的分析型云数据库 Greenplum
新时代的分析型云数据库 Greenplum新时代的分析型云数据库 Greenplum
新时代的分析型云数据库 Greenplum锐 张
 
开源+自主开发 - 淘宝软件基础设施构建实践
开源+自主开发  - 淘宝软件基础设施构建实践开源+自主开发  - 淘宝软件基础设施构建实践
开源+自主开发 - 淘宝软件基础设施构建实践Wensong Zhang
 
善用 MySQL 及 PostgreSQL - RDBMS 的逆襲 - part1
善用 MySQL 及 PostgreSQL - RDBMS 的逆襲 - part1善用 MySQL 及 PostgreSQL - RDBMS 的逆襲 - part1
善用 MySQL 及 PostgreSQL - RDBMS 的逆襲 - part1Yi-Feng Tzeng
 
分布式存储的元数据设计
分布式存储的元数据设计分布式存储的元数据设计
分布式存储的元数据设计LI Daobing
 
Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計Wei-Yu Chen
 

Similar to Hadoop大数据实践经验 (20)

Qcon2013 罗李 - hadoop在阿里
Qcon2013 罗李 - hadoop在阿里Qcon2013 罗李 - hadoop在阿里
Qcon2013 罗李 - hadoop在阿里
 
What could hadoop do for us
What could hadoop do for us What could hadoop do for us
What could hadoop do for us
 
Apache hadoop and cdh(cloudera distribution) introduction 基本介紹
Apache hadoop and cdh(cloudera distribution) introduction 基本介紹Apache hadoop and cdh(cloudera distribution) introduction 基本介紹
Apache hadoop and cdh(cloudera distribution) introduction 基本介紹
 
大规模数据处理
大规模数据处理大规模数据处理
大规模数据处理
 
大规模数据处理
大规模数据处理大规模数据处理
大规模数据处理
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Zh tw introduction_to_hadoop and hdfs
Zh tw introduction_to_hadoop and hdfsZh tw introduction_to_hadoop and hdfs
Zh tw introduction_to_hadoop and hdfs
 
Hadoop的典型应用与企业化之路 for HBTC 2012
Hadoop的典型应用与企业化之路 for HBTC 2012Hadoop的典型应用与企业化之路 for HBTC 2012
Hadoop的典型应用与企业化之路 for HBTC 2012
 
Hadoop con 2015 hadoop enables enterprise data lake
Hadoop con 2015   hadoop enables enterprise data lakeHadoop con 2015   hadoop enables enterprise data lake
Hadoop con 2015 hadoop enables enterprise data lake
 
分布式流数据实时计算平台 Iprocess
分布式流数据实时计算平台 Iprocess分布式流数据实时计算平台 Iprocess
分布式流数据实时计算平台 Iprocess
 
Introduction of Spark by Wang Haihua
Introduction of Spark by Wang HaihuaIntroduction of Spark by Wang Haihua
Introduction of Spark by Wang Haihua
 
淘宝分布式数据处理实践
淘宝分布式数据处理实践淘宝分布式数据处理实践
淘宝分布式数据处理实践
 
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
 
百度系统部分布式系统介绍 马如悦 Sacc2010
百度系统部分布式系统介绍 马如悦 Sacc2010百度系统部分布式系统介绍 马如悦 Sacc2010
百度系统部分布式系统介绍 马如悦 Sacc2010
 
新时代的分析型云数据库 Greenplum
新时代的分析型云数据库 Greenplum新时代的分析型云数据库 Greenplum
新时代的分析型云数据库 Greenplum
 
开源+自主开发 - 淘宝软件基础设施构建实践
开源+自主开发  - 淘宝软件基础设施构建实践开源+自主开发  - 淘宝软件基础设施构建实践
开源+自主开发 - 淘宝软件基础设施构建实践
 
善用 MySQL 及 PostgreSQL - RDBMS 的逆襲 - part1
善用 MySQL 及 PostgreSQL - RDBMS 的逆襲 - part1善用 MySQL 及 PostgreSQL - RDBMS 的逆襲 - part1
善用 MySQL 及 PostgreSQL - RDBMS 的逆襲 - part1
 
分布式存储的元数据设计
分布式存储的元数据设计分布式存储的元数据设计
分布式存储的元数据设计
 
Hic2011
Hic2011Hic2011
Hic2011
 
Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計
 

More from Schubert Zhang

Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and InfrastructureSchubert Zhang
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSchubert Zhang
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile DevelopmentSchubert Zhang
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processingSchubert Zhang
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Schubert Zhang
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aSchubert Zhang
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor IntroductionSchubert Zhang
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBaseSchubert Zhang
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on HadoopSchubert Zhang
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-streamSchubert Zhang
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南Schubert Zhang
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionSchubert Zhang
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Schubert Zhang
 

More from Schubert Zhang (20)

Blockchain in Action
Blockchain in ActionBlockchain in Action
Blockchain in Action
 
科普区块链
科普区块链科普区块链
科普区块链
 
Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and Infrastructure
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluation
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile Development
 
Career Advice
Career AdviceCareer Advice
Career Advice
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
 
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223a
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBase
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on Hadoop
 
Fans of running gump
Fans of running gumpFans of running gump
Fans of running gump
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-stream
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
 

Hadoop大数据实践经验

  • 1. Hadoop大数据实践经验 2012年12月 Schubert Zhang, Stephen Xie, Clay Jiang
  • 2. 汉播与开源(Hadoop Ecosystem) 开源项目代码、补丁、问题报告: 为开源社区提供文档、测试报告和贡献资料: HBASE-1778、HBASE-1818、HBASE-1841、HBASE- http://www.slideshare.net/hanborq 1978 、HBASE-1902、HBASE-3943、HBASE-1296、 http://www.slideshare.net/schubertzhang MAPREDUCE-3685 、MAPREDUCE-4039、 http://hbase.apache.org/book.html#regions.arch CASSANDRA-1729 …… http://github.com/hanborq 汉播与开源 http://slideshare.net/hanborq http://slideshare.net/schubertzhang 2013/3/21 2
  • 5. 集群规划 - Hadoop版本选择 • 开源发布 – Apache/Hortonworks – Cloudera – Facebook – MapR – Hanborq or BigCloud … • Hadoop, HBase, etc. 2013/3/21 5
  • 6. 集群规划 – 硬件选择 • 什么是Commodity Hardware? • CPU:Memory:Disk,依赖于workload特性需求 • JBOD vs. RAID vs. SPAN/BIG • 虚拟机、Blades、SAN不适用 • Master节点 • Worker(Slave)节点 – NameNode and SecondaryNameNode – JobTracker – 存储+计算 – 可靠性要求更高 – 存储密集(DW) • 双电源 • 捆绑网口 – 保留20~30%磁盘空间做MapReduce、 • RAID1 vs. RAID10 vs. RAID5 HDFS、HBase临时数据 • 专用独立的硬盘 • 备份到NFS – 每个task大概1~4GB内存,48GB内 – 大内存 存支持约10~20 tasks • NameNode: 1GB 大约100万Blocks, 小文件问题 • JobTracker: History Task Status, Counters, etc. (mapred.jobtracker.completeuserjobs.maximum) – 小硬盘 • < 1TB即可,如500GB 2013/3/21 6
  • 7. 集群规划 – 硬件选择 – 举例 2013/3/21 7
  • 8. 集群规划 – 集群大小 • 存储和计算能力需求 – CPU、Memory、存储、Disk IO – 计算作业的频率 • 按存储需求来估算 (常用) – 长期存储的历史数据 – 每日增量的数据 – 实际参与计算的数据 • 按计算能力来估算 (较难) 2013/3/21 8
  • 9. 集群规划 – OS及内核调优 • 操作系统 – RHEL/CentOS/Ubuntu … – JDK (http://wiki.apache.org/hadoop/HadoopJavaVersions) • 1.6.0_24/26/31 • -XX:+UseCompressedOops • GC options – cron/ntp/ssh/sendmail/rsync/sysstat/dstat – Hostname和DNS – 用户,组,权限 • 内核调优 – ulimit: open files, max user processes, etc. – vm.swappiness – vm.overcommit_memory – … 2013/3/21 9
  • 10. 集群规划 – 磁盘规划 • Linux文件系统 – Ext3 – Ext4 (extent-based, sequential) – Xfs (extent-based, concurrent) – LVM (anti.) 2013/3/21 10
  • 11. 集群规划 – 网络规划 • 考虑HDFS/MapReduce 1Gb vs. 10Gb (1) 两层树结构 (中小集群) (2) 三层树结构 (大集群瓶颈) • East/West vs. North/South流量模型 • 写穿 • Shuffle • 机架拓扑,交换机拓扑 • 读:尽量本地;写:本地优先 • 10Gb: low latency, high throughput • 接入交换机:10Gb的价格是1Gb的三倍 • ETL:10Gb • Bonding ->10Gb • CLOS Network: http://research.google.com/pubs/pub36740.html http://www.csdn.net/article/2011-05-20/298348 (3) Spine Fabric (扩展性好) • Redundancy 2013/3/21 11
  • 12. 集群自动化部署 - ClusterMaster • ClusterMaster是一款自动化部署和运维系统。旨在解决大规模 集群部署和运维过程中的几个核心问题:操作系统部署、应用部 署及管理、配置管理、性能数据采集、报警监控。 • ClusterMaster集成Kickstart、Cobbler、Puppet、Ganglia、 Nagios等开源软件为一体,向用户提供统一的Web管理界面。 • 依托强大的开源软件,ClusterMaster具有良好的可扩展性,可 以应对Linux集群运维中的各种应用部署和操作系统部署需求。 用户只需提供部署脚本和安装包,剩下的工作都将由 ClusterMaster自动完成。 2013/3/21 12
  • 15. 运维 - HDFS WebUI和运维策略 2013/3/21 15
  • 16. 运维 – MapReduce WebUI和运维策略 2013/3/21 16
  • 17. 运维 – 善用Metrics 有选择地集成Ganglia或OpenTSDB实现关键Metrics实时可视化。 2013/3/21 17
  • 18. 运维 – 监控报警可视化工具 ClusterMaster Ganglia Nagios 2013/3/21 18
  • 19. 运维 – Job统计 定期产生Job运行状况报表 2013/3/21 19
  • 20. 运维 – 数据迁移(案例) • 小数据中心->大数据中心 • Distcp • Seed Nodes • Balance 2013/3/21 20
  • 21. Hadoop生态系统 - 构建完整的解决方案 User, Applications API/QL API/QL User Hive/Pig Big HugeTable Data MapReduce User Flume/ Big Bigtable HBase Oozie Data Flive Bigtable …… …… User Big file HDFS file file Data Shared Cluster of Servers 2013/3/21 21
  • 22. Hadoop生态系统 - 选择合适的工具解决合适的问题 Hive vs. Pig. vs. Java vs. Others 2013/3/21 22
  • 23. 性能评估测试 – Benchmark工具 • 写数据操作(入库)性能测试 • 读数据操作(查询)性能测试 2013/3/21 23
  • 24. 性能评估测试 – Benchmark工具 SLA核心指标 • Throughput – tThrou :Total Throughput (operation count) – dThrou : Delta Throughput (operation count) • Latency – tAvgLat: Total Average percentage of read ops Latency (ms) 25.00% – dAvgLat: Delta Average Latency (ms) 20.00% 15.00% – dMaxLat : Delta Maximum 10.00% Latency (ms) 5.00% – dMinLat : Delta Minimum 0.00% Latency (ms) 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 100ms • Quantile % • Total : from benchmark start to  Read Throughput: average ~140 ops/s; per-node: average ~16 ops/s present.  Latency: average ~500ms, 97% < 2s (SLA) • Delta: between each statistical  Bottleneck: disk IO (random seek) (CPU load is very low) interval (2 minutes here) 2013/3/21 24
  • 25. Known Issues and Work Arounds • Hadoop系列产品还有很多已知的和未知的问题 • 这里只是举例: – CentOS/RedHat Linux: transparent hugepage compaction – HDFS: 正在写的文件不能读 – MapReduce: LZO压缩文件index过大,OOME – Hive: MySQL Connection长时间连接中断导致Job失败 – HBase: 多CF的Memstore Flush触发机制 – … 2013/3/21 25
  • 26. Hadoop优化 • MapReduce • HBase • Flume • Hive • Pig • Tools 2013/3/21 26
  • 27. Hadoop优化 @MapReduce性能 Job Startup Latency (seconds) Job Startup Latency (seconds) Total Tasks (32 maps, 4 reduces) Total Tasks (96 maps, 4 reduces) 30 50 43 24 21 40 20 30 24 20 10 1 10 1 0 0 CDH3u2 (Cloudera) CDH3u2 (Cloudera) HDH3u2 (Hanborq) CDH3u2 (Cloudera) CDH3u2 (Cloudera) HDH3u2 (Hanborq) (reuse.jvm disabled) (reuse.jvm enabled) (reuse.jvm disabled) (reuse.jvm enabled) Sort Avoidance and Aggregation Real Aggregation Jobs 2400 2186 700 2200 2000 600 1800 1600 500 1400 time (seconds) 400 1200 time (seconds) 1000 300 800 615 600 200 400 216 198 197 175 100 200 0 0 Case1 Case2 Case3 Case1-1 Case2-1 Case1-2 Case2-2 CHD3u2 (Cloudera) 197 216 2186 CDH3u2 (Cloudera) 238 603 136 206 HDH (Hanborq) 175 198 615 HDH (Hanborq) 233 578 96 151 27
  • 28. Thank You Very Much! 2013/3/21 28