云计算核心技术架构分论坛 一石三鸟 性能 功耗及成本
Upcoming SlideShare
Loading in...5
×
 

云计算核心技术架构分论坛 一石三鸟 性能 功耗及成本

on

  • 1,136 views

云计算核心技术架构分论坛 一石三鸟:性能、功耗及成本

云计算核心技术架构分论坛 一石三鸟:性能、功耗及成本

Statistics

Views

Total Views
1,136
Views on SlideShare
1,061
Embed Views
75

Actions

Likes
1
Downloads
18
Comments
0

5 Embeds 75

http://cloud.csdn.net 51
http://www.csdn.net 13
http://info.53dns.com 6
http://www.53dns.com 4
http://articles.csdn.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

云计算核心技术架构分论坛 一石三鸟 性能 功耗及成本 云计算核心技术架构分论坛 一石三鸟 性能 功耗及成本 Presentation Transcript

  • One Stone, Three Birds:Performance, Power and Space Ihab Bishara Director, Cloud Computing Products May 21st 2010
  • What is cloud?Web WebSurfing SurfingClients Thousands and millions of servers for Clients your computing needs
  • Cloud is needs bigger, more cost efficient,and more power efficient datacenters Power 200KW/R-Cu/Ft 150KW/R-Cu/Ft 50KW/R-Cu/Ft Budget 10KW/R-Cu/Ft Datacenter Profit center Cost center Operation Enterprise scale Compute Cloud scale serving thousands Requirement serving 100s of millions <1998 – 2002 2003 – 2007 2008 2015� Compute requirements increasing to serve millions of users� Focus on maximizing the bottom line by reducing datacenter cost� Power allowances decreasing due to size and delivery complexities
  • Power and cooling are top cost issues� Power and cooling cost is growing faster than new server B investment� Only 3% of the power is used for computing� IT organization will spend almost $1 on P&C for every $1they spend on new servers� Developing a power efficient datacenter is more important than ever
  • Current solutions are not solving clouddatacenters issues“To build servers for companies like Facebook, and Amazon, and otherpeople who are operating fairly homogeneous applications, the servershave to be cheap, and they have to be super power-efficient. The latestgenerations of server processors from Intel and AMD dont deliver theperformance” Jonathan Heiliger, Facebooks VP of technical operations
  • “Problems cannot be solved by the same level ofthinking that created them.” Albert Einstein them.”• Current few cores technologies fail to deliver for cloud � Power too high � Performance not increasing fast enough � Integration too low � Cores are inefficient and continue to bloat• Manycore is the way to a new horizon � Higher performance at much lower power � SoC integration to reduce cost and real estate � Standard programming models and improving
  • Manycore provides performance, lowpower and low cost Current Technology Tilera Manycore workers C C S S P P Helpers and Management � More cores � more BOPS Madison Itanium2 � Lower frequency � low power� Less than 4% of area to ALUs Chandrakasan effect� High frequency, high power � Simple, integrated � Lower cost
  • Tilera: the only technology delivering on the promise of multicore cores 100 - Performance & Scalability 64 cores - Power, Price, Footprint - Same architecture 36 cores up and down 16 Outstanding Scalability coresScalability Up to 8 cores Up to The Other Solutions: 32 starved Quad core cores • Discontinuity in architectures Dual • Limited scalability core • Power inefficiency # of Cores
  • Proven best performance/Watt 2X Quad XEON x86 Server 1X TILEPro Server 300W under load 40W under load 30W target (optimized server)Measured by Tier 11Server OEM running MemcacheD on its own Tilera and x86-based servers Measured by Tier Server OEM running MemcacheD on its own Tilera and x86-based servers One TILEPro64 performance = Dual Quad XEON Much Lower Power 7X Compute/Watt advantage
  • An order of magnitude better processing ina standard 5Kwatt rack High efficiency x86 Tilera-based high density Server 2U production server � 12 2U servers per rack � 20 2U servers per rack � 48 processors � 160 processors � 196 cores � 10,240 cores � 1,944 BOPS � 20,000+ BOPS � 96 Gbps I/O � 3,500 Gbps I/O � 5K watt � 5K watts New Tilera server ideal for cloud throughput applications Best performance, I/O, power, and density Complete utilization of power and space of a standard rack
  • Slashing Total Cost of Ownership for agiven performance Tilera x86 Same Performance 60 dual socket servers 100 dual socket servers 4K watts 20K watts� Up to 40% CAPEX savings� Up to 80% OPEX savings� Slashing the TCO by up to 50%
  • Standard SW stack on Tier 1 servers Front end Markets Web Database Data-Mining Lighttpd Infrastructure Apps Memcached Language Perl Support Commercial Compiler, OS Linux gcc & g++ Distribution Hypervisor ~40 Low Power Watts Tilera-based Prototype Tier 1 Servers Server (now) ODM/OEM Production Server starting Q3 10
  • Focused on internet datacenters runningLAMP stack Web Apache/ Data- Surfing Mem- mining Clients PHP cached Web Servers Apps Servers Load Network Balancers Switches Database Servers
  • Scalable performance with completecores and on-chip iMesh network TILE ArchitectureComplete cores with integrated cache 2 Dimensional on-chip mesh network Processor Register file Three Execution Pipelines Cache 32K L1-I I-TLB 2D 32K L1-D D-TLB DMA 256K L2 Terabit Switch Tile = Core + Switch iMesh = 200 Tbps * on-chip bandwidth Tbps* *Based on TILE-Gx100
  • Tilera single core performance comparableto Atom & ARM Cortex-A9 cores Single-Core Single thread CoreMark™ Comparison 3,500 3,000 CoreMark Score 2,500 2,000 1,500 1,000 500 - Tilera Tilera ARM Intel TILEPro64 TILE-Gx36 Cortex-A9 Atom N270 866 MHz 1.25 GHz 1 GHz 1600 MHz- Data for TILEPro, ARM Cortex-A9, Atom N270 is available on the CoreMark website http://coremark.org/home.php- Telex and single thread Atom results were measured in Tilera labs- Single core, single thread result for ARM is calculated based on chip scores
  • Tilera offers standards-based tools andsoftware stack Multicore Development Environment Standards-based tools Standard application stackStandard programming Application layer� SMP Linux 2.6.26 � Open source apps Applications Applications� Java � Standard C/C++ libs libraries� ANSI C/C++ Operating System layer Operating SystemIntegrated tools � 64-way SMP Linux kernel drivers� SGI or GCC compiler � Zero Overhead Linux� Standard gdb gprof � Bare metal environment Hypervisor� Eclipse IDE Virtualization and high speed I/O drivers Hypervisor layerInnovative tools � Virtualizes hardware Tile Processor� Multicore debug � I/O device drivers Tile Tile Tile Tile …� Multicore profile
  • Example implementation on aManycore-based server� Complete functionality DDR2 Controller 0 DDR2 Controller 0 DDR2 Controller 1 DDR2 Controller 1 – Networking – Load balancer SerDes SerDes SerDes SerDes PCIe 0 PCIe 0 XAUI 0 XAUI 0 – Application Cloud applications Flexible Flexible� Manycore values I/O I/O UART UART GbE 0 GbE 0 JTAG JTAG – Granularity: Complete SPI, I2C SPI, I2C GbE 1 GbE 1 server on one chip – Integration: I/O and TCP termination SerDes SerDes SerDes PCIe 1 SerDes PCIe 1 XAUI 1 XAUI 1 memory controllers – Low power: 20watts mgmntLoad balance NIC – High performance DDR2 Controller 3 DDR2 Controller 3 DDR2 Controller 2 DDR2 Controller 2 – highly efficient communication on-chip One SMP OS
  • A complete LAMP server at 40watts� Complete LAMP stack running on server Client browser – Linux, Apache, PHP, MySQL Data and management� Serving standard cloud applications over network – SugarCRM for enterprise – Gallery2 for photo sharing ~40 Watts� Using standard Server management protocols – Standard SNMP – MRTG� At 40 watts – 40 watts power draw – Target 30 watts for optimized platform LAMP Server
  • This is just the beginning… beginning…Tilera’s portfolio demonstrates the scale of many-coreTilera’ Next generation Performance Next generation Performance Performance � >4X the performance of Pro64 � >4X the performance of Pro64 � Twice the power density of Pro64 � Twice the power density of Pro64 Gx 100 High Tier Gx 64 TILEPro64 TILEPro64 Mid Tier Gx TILE64 TILE64 36 Low Tier Gx 16 TILE-Gx Mid-Low Tier TILE-Gx Mid-Low Tier � Small footprint � Small footprint TILEPro36 � Low power � Low power TILEPro36 � Rich I/O � Rich I/O 2007 2008 2009 2010
  • TILE-Gx100: High end system-on-a-chipUnder 55 watts! 4x GbE Memory Controller (DDR3) Memory Controller (DDR3) SGMII � 1.25GHz – 1.5GHz SerDes SerDes Memory Controller (DDR3) Memory Controller (DDR3) SerDes SerDes MiCA MiCA 10 GbE 10 GbE XAUI XAUI � 32 MBytes total cache 4x GbE UART x2, UART x2, USB x2, SGMII � 200 Tbps iMesh BW USB x2, JTAG, 10 GbE 10 GbE Interlaken JTAG, Interlaken 2C, II2C,SPI SPI XAUI XAUI 4x GbE SGMII � >½ Tb/sec memory BW SerDes SerDes SerDes SerDes SerDes SerDes SerDes SerDes SerDes SerDes PCIe 2.0 10 GbE 10 GbE � Addresses 1TByte of RAMSerDes PCIe 2.0 XAUISerDes XAUI 8-lane 8-lane 4x GbE � 80-120 Gbps packet I/O SGMII 10 GbE 10 GbE � 80 Gbps PCIe I/O XAUI mPIPE XAUI mPIPE PCIe 2.0 4x GbESerDes PCIe 2.0SerDes 8-lane 8-lane SGMII 10 GbE � Wire-speed packet engine 10 GbE XAUI XAUI – 120Mpps 4x GbE PCIe 2.0 SGMIISerDes PCIe 2.0SerDes 4-lane 10 GbE 4-lane 10 GbE MiCA™ engines: MiCA™ Interlaken � Interlaken XAUI XAUI Flexible Flexible 4x GbE – 40 Gbps crypto SGMII I/O I/O 10 GbE 10 GbE – 20 Gbps compress & XAUI XAUI 20 Gbps decompress 4x GbE MiCA MiCA SGMII SerDes SerDes Memory Controller (DDR3) Memory Controller (DDR3) Memory Controller (DDR3) Memory Controller (DDR3) 10 GbE 10 GbE XAUI XAUI
  • TILE-Gx36™: Midrange system-on-a-chipUnder 22 watts! � 1.0, 1.25, 1.5 GHz speeds MiCA MiCA Memory Controller (DDR3) Memory Controller (DDR3) 4x GbE � 12 MBytes total cache SGMII 66 Tbps iMesh BW SerDes SerDes � UARTx2, 10 GbE 10 GbE UARTx2, XAUI USBx2, USBx2, XAUI JTAG, JTAG, II C, SPI 22 C, SPI 4x GbE � 200 Gbps memory BW SGMII 40 Gbps total packet I/O SerDes � SerDes SerDes SerDes x4 10 GbE x4 48 Gbps PCIe I/O PCIe 2.0 8-lane 10 GbE PCIe 2.0 8-lane XAUI XAUI � mPIPE mPIPE 4x GbE � Wire-speed packet engine SerDes SerDes x4 x4 SGMII SerDes SerDes 10 GbE 10 GbE – 60Mpps XAUI XAUI PCIe 2.0 SerDes PCIe 2.0 SerDes 4-lane 4-lane 4x GbE SGMII � MiCA crypto engine SerDes SerDes – 20 Gbps crypto Flexible Flexible I/O 10 GbE 10 GbE I/O Memory Controller (DDR3) Memory Controller (DDR3) XAUI XAUI – 10 Gbps compress & 10 Gbps decompress
  • Summary� Cloud faces critical performance, power and cost issues� Tilera TILEPro processors deliver a proven solution to all three problems. Slashing TCO� Tilera roadmap continues to deliver the promise of many core for performance, power and cost
  • Thank you