• Save
No[1][1]
Upcoming SlideShare
Loading in...5
×
 

No[1][1]

on

  • 1,839 views

 

Statistics

Views

Total Views
1,839
Views on SlideShare
1,838
Embed Views
1

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

No[1][1] No[1][1] Presentation Transcript

  • 多核平台下数据中心的性能优化和案例分析 乔楠 应用软件优化高级工程师 n [email_address]
  • Agenda Intel Software and Solution Group Introduction Intel Software Tuning Tools IPDC Tuning Methodology IPDC Tuning Case Study Summary
  • Agenda Intel Software and Solution Group Introduction Intel Software Tuning Tools IPDC Tuning Methodology IPDC Tuning Case Study Summary
  • Intel Software And Solution Group 22+ countries • 53 major sites Intel® Solutions Services Business & Tech Services Best practices, Proof of concepts Solution implementation Intel® Software Development Products Compilers Threading tools Cluster Tools Performance Libraries Code Performance analysis Research & Development Linux & Open Source Managed Run time Threading expertise HPC expertise Clusters & Grid Advanced Computing Intel® Software Network “ One-Stop Shop For Developers” www.intel.com/software Intel® Software Partners Program Access to Intel Technology Remote Access Services Loaner SDP’s Membership Program Go-to-Market Programs Intel ® Software College Multi-Core Programming High Performance Computing Cluster setup Intel® Integrated Performance Primitives Moving to Multi-Core Class room and web based
  • CRT Focus Domains
  • Enterprise Data Center Trends Scale-Out is the predominate architecture Exponential data explosion Manageability & Security Virtualization A More Complex Environment Supported through Software Service Oriented Architectures & Software as a Service Simplify Optimize Innovate Animation Need to improve efficiency & manage resources, power & cooling in a dynamic, automatic way Growth drives needs for high performance applications Solutions embrace new emerging technologies & drive new business models
  • CRT Datacenter Resource
    • CRT Data Center in Dupont, Washington – All 4 clusters & Altix SMP are operational. Notable activity this month:
      • Atlantis (64-node CTN) – PNNL deal is on Atlantis. They completed testing with Infiniband interconnect and now starting to switch over to 32 nodes exclusive access with IBM Quadrics HW.
      • Challenger (32-node WDC)- MySQL successfully completed their testing in exclusive access using Dolphin interconnect. Challenger is now hosting U. Turkey in shared access until ww13 when they will switch over to Atlantis.
      • Endeavor (256-node Woodcrest) – Intersystems is making progress in exclusive access. ….
    Intel ENDEAVOR 464 Intel® Xeon® Processors 5100 series, 6.85 Teraflop MP Linpack, #68 on top500
  • CRT Datacenter n/a Mellanox MHGS18-XTC (DDR) memfree Mellanox MHGA28-XS or MHGA28-XSC memfree Mellanox MHGA28-XS or MHGA28-XSC memfree Mellanox MHGA28-XT or MHGA28-XTC memfree IB adapters SuSE Linux ES 10 RedHat EL4 update 4 OFED 1.2 RC 2 RedHat EL4 update 4 OFED 1.2 RC 2 RedHat EL4 update 4 OFED 1.2 RC 2 RedHat EL4 update 4 OFED 1.2 RC 2 OS / IB stack System Configuration Silverstorm 9240 (DDR) 84 ports Cisco 4506 96 ports Panasas 1 shelf 3 TB storage 120 GB SATA HDD FBDIMM 4x2 GB 533MHz Conroe XE 2.93 GHz / 4 MB cache 1066 MHz FSB S3000PT Port Townsend board 1U UP server 64 Discovery n/a Mellanox REVMT3000AQ4 72 ports SilverStorm 9240 (SDR) 288 ports Infinicon 9120 (SDR) 144 ports IB switch n/a Cisco Catalyst 4506 144 ports Cisco Catalyst 4510 336 ports Hitachi 48 ports GigE Switch detail 9 TB SGI SAN RapidScale 2 storage servers 4 TB storage Panasas 7 shelves 35 TB storage RapidScale 6 storage servers 18 TB Cluster File System Abstract none 120GB SATA HDD 120GB SATA HDD 120 GB SATA HDD Hard drive 1 TB (256 x 4GB modules) FBDIMM 8x1GB 667MHz FBDIMM 8x1GB 667MHz FBDIMM 8x1GB 667MHz RAM Montecito 1.6 GHz / 24M cache 533 MHz FSB Clovertown B3 step 2.67ghz / 4MB cache 1333 MHz FSB Woodcrest B0 step 3.0 GHz / 4MB cache 1333 MHz FSB Woodcrest B0 step 3.0 GHz / 4MB cache 1333 MHz FSB CPU / Stepping Altix 4700 SMP 8 CPU boot set 128 CPU compute set EPSD S5000XSL Alcolu Gold main board 1U DP server EPSD S5000XSL Alcolu silver main board 1U DP server SuperMicro SDP Bensley server 4U DP server Platform 136 sockets 64 256 32 Node count Alice Atlantis Endeavor Challenger
  • CRT’s Menu of Services
    • Discovery and Evaluation of End-User Deal
    • Performance Characterization
      • Single Node and Cluster Scaling
      • Linux* or Windows* platforms
    • Porting and Performance Optimization
    • Intel® SW Tools Consulting and Support
    • Support provided On Site, at Intel locations, or in Intel® Solution Centers
    • Expert Consulting in HPC/WS, Financial Analytics, DB, CRM, Power & Performance Characterization, Decision Support areas
    • Access Intel® Xeon® processor platform family & Intel® Itanium® processor platform family clusters remotely through Intel® Remote Access Service
  • Key Applications Enabled on 64-bit Intel® Xeon® Processor
    • CAE (Linux*)
      • ABAQUS Standard, Explicit
      • Ansys (ANSYS)
      • CFD++ (Metacomp)
      • CFD-ACE+ (ESI)
      • Feko* (EMSS)
      • Fluent* (Fluent)
      • FIRE* (AVL)
      • HyperMesh * (Altair)
      • LS-Dyna* (LSTC)
      • Mafia* (CST)
      • MSC.Nastran* (MSC.Software)
      • NX Nastran* (UGS)
      • PAM-Crash* (ESI)
      • PowerFLOW* (EXA)
      • Star-CD* (CD/Adapco)
    • CAD (Microsoft Windows*)
      • Pro/ENGINEER * (PTC)
      • NX * (UGS)
      • Parasolid * (UGS)
      • Tc Vis * (UGS)
    • Energy (Linux)
      • Eclipse* (Schlumberger)
      • Petrel / GigaViz (Schlumberger)
      • Omega2 (Western Geco)
      • VIP / Nexus (Landmark Graphics)
      • Geoprobe (Landmark Graphics)
      • Promax / SeisSpace (Landmark Graphics)
      • Geodepth* (Paradigm Geo)
      • Focus* (Paradigm Geo)
      • Geocluster (CGG)
      • Financial Services (Linux*)
      • RiskWatch (Algorithmic)
      • RMDS (Reuters)
      • Thomson Financial
      • Credient, Intellitracs (Sungard)
      • CPS (Morgan-Stanley)
      • DCC (Linux / Windows*)
      • Alias – Maya
      • Discreet – 3ds max
      • mental images – mental ray
      • Pixar RenderMan
    Workstation applications * Other brands and names may be claimed as the property of others. Comprehensive Set of Applications Life Sciences (Linux*) BLAST Gaussian AMBER GAMESS Gromacs HMMER NAMD NWS (Linux*) MM5 WRF CCSM3 POP CAM Aladin Hirlam UM
  • Agenda Intel Software and Solution Group Introduction Intel Software Tuning Tools IPDC Tuning Methodology IPDC Tuning Case Study Summary
  • Levels of Parallelism Intel ® Software Development Tools *Other names and brands may be claimed as the property of others. Developer Considerations Intel Software Tools help address many parallel programming issues Multi-Node Cluster Level Multi-Core/SMP Node Level Serial Core Level ; Trace Analyzer VTune Thread Profiler VTune™ Performance Analysis Cluster MKL MKL IPP MKL IPP Performance Libraries Message Checker IDB-MPP Thread Checker IDB IDB Correctness & Debugging Cluster OpenMP Intel MPI Auto-Parallelization OpenMP* TBB C/C++ FORTRAN95 Programming Model, Implementation   
  • Sampling Collects System-wide Performance Data VTune™ Analyzer Features and Usage Models
  • Sampling Over Time Views Show How Sampling Data Changes Over Time VTune™ Analyzer Features and Usage Models
  • Sampling Source View Displays Source Code Annotated with Performance Data VTune™ Analyzer Features and Usage Models
  • Call Graph Collects and Displays Information About the Program Flow of the Application VTune™ Analyzer Features and Usage Models
  • Intel ® Thread Checker Software Tools
  • Intel® Thread Profiler Pinpoints threading inefficiencies PINPOINTS INEFFICIENCIES PINPOINTS INEFFICIENCIES
  • Agenda Intel Software and Solution Group Introduction Intel Software Tuning Tools IPDC Tuning Methodology IPDC Tuning Case Study Summary
  • 3 层性能考量
    • 系统级
      • Processor / Memory
      • Network
      • Disk
      • Operating System
    • 应用级 (PLP & TLP)
      • Algorithmic issues
      • APIs
      • Locks
      • Heaps
      • Execution Threads
    • 微架构级 (ILP)
      • Processor Stalls
      • Branch prediction
      • Code/data alignment
      • Cache optimization
    Top down approach VTune Perfmon Emon APImon Traces App/Architecture Expertise Code Inspection Other Sessions ITC/ITA Threading tracker/profiler SEP
  • 系统优化
    • 确认由硬件访问迟缓造成的性能限制
      • 例如 : 一个地震资料处理系统会主要受限于磁盘 I/O 瓶颈 , 而不是数据处理速度
    • 通常考虑的系统性能因素包括
        • 磁盘 I/O ,网络 I/O ,存储访问,内存使用
    我们是否考虑了全部必要的系统因素 ?
  • 应用程序调优
    • 确定关键执行路径的可优化的代码段
        • 关键执行路径指耗用最长执行时间的
    • 通过对下列常见问题区域的性能改善,可以缩短关键路径执行时间-从而降低整体执行时间
        • 常见的可线程化区域
        • 线程数据交互和同步,数据结构的选择,循环例程和系统 API 调用的结果
        • 在分散存储的集群系统中 MPI 消息传递造成的应用性能扩展问题
    Intel 引领行业应用的多线程化
  • 微架构调优
    • 针对特定的处理器架构作指令级优化
        • 数据对齐 , 数据结构,分支 , 高速缓存的使用等 .
    • 微架构优化可实现非常显著的性能提升
        • 在其他优化手段完成后再实施
        • 需要经验丰富的底层架构优化工程师 , 深入持续的支持
    优势 – 使用 Intel 开发、调优工具加快优化速度;微架构调优可反馈,改善系统架构
  • Cluster MKL Code Coverage tool Test prioritize tool ClusterOpenMP extension Performance-oriented programming model using Intel toolsets Trace Analyzer Link to the Executables Trace Collector Traces files Optimizing Source Code Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. Running/ Perf. Analyzing Compiling Intel MKL/IPP Intel Vtune Intel MPI Thread Profiler Thread Checker IDB/TotalView Debug Tools Intel Compiler Message Checker Correct?
  • Agenda Intel Software and Solution Group Introduction Intel Software Tuning Tools IPDC Tuning Methodology IPDC Tuning Case Study Summary
  • 数据中心优化案例分析
    • 问题表象
    • 随着处理器核数的增加,最大处理请求任务数不能随之增加
    • 处理器有很多时间是处于空闲状态,处理器的利用率在双路双核至强平台有 70% ,在双路四核上是 45% ,在四路四核上是 25%
    And no source code access
  • 数据中心优化案例分析 ( 续 )
    • 系统级性能分析
      • 使用常用工具: iostat, vmstat, nmon, strace, Cluster Ready
      • 网络 I/O ,磁盘 I/O (在内存充裕情况下) 暂时 还不是系统瓶颈
      • 线程同步有可能是问题的关键
  • 数据中心优化案例分析 ( 续 )
    • 应用级性能分析
      • 工具: Intel Vtune/Thread Profiler
      • 程序已经高度线程化
      • 多核心情况下( >=8 核心),性能相对于核心数目可扩展性不理想。 Thread Profiler 结果显示程序中存在大量的线程同步开销
      • 采用 NTPL 实现性能突破: 在 RHEL4 中 , 默认的静态链接是 LinuxThread, 需要显式指定 NPTL
    降低了 20% CS 从 47% 到 86% CPU% 降低了 57% 提高了 76% 时延 NPTL vs. Linux Thread 最大处理链接数 对比
  • 数据中心优化案例分析 ( 续 )
    • 微体系架构层面性能分析
      • 工具: EMON
      • 处理器执行时间仅占 1/4 左右
      • 处理器资源不足导致的流水线停滞相对较高
      • 存在较大的性能优化空间
  • 数据中心优化案例分析 ( 续 )
    • 微体系架构层面性能分析
      • 工具: EMON
      • 处理器执行时间仅占 1/4 左右
      • 处理器资源不足导致的流水线停滞相对较高
      • 当 SF:on, PF:off, 我们得到性能的提升
    1.07% 1.09% 0.97% L2 Miss 0.40% 0.42% 0.97% L2 Miss Demand 31 34 33 DBus Util 86% 60% 57% Bus Util 2.76 2.60 2.42 CPI sf:off/pf:on sf:on/pf:on sf:on/pf:off
  • Summary
    • New Intel SSG initiative focused on Internet Portal Data Center (Baidu, Yahoo, Google, Amazon, Windows Live etc>)
    • Looking to establish long term strategic partnership with China’s IPDC operators
    • Improve performance, efficiency & TCO on today’s platforms for IPDC’s apps
    • Deliver a platform suited to China’s data center for future deployments
  •  
  • backups