Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Enterprise supercomputers ike nassi_v5


Published on

talk given at UC Santa Cruz, 2012

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Enterprise supercomputers ike nassi_v5

  1. 1. Enterprise SupercomputersIke Nassi(Work done while at SAPJoint work with Rainer Brendle, Keith Klemba, and David Reed)UCSC, April, 2012
  2. 2. A Personal ReflectionLet me start with a story.The setting. Early 1982:Digital Equipment Corporation (DEC) had introduced the VAX (a super-minicomputer) three years earlierSales were going wellThe VAX was a popular productAnd then one day…January 17, 2012 2 2
  3. 3. The Moral of the StoryWe see that… Powerful personal workstations Unmet Needs A Large Market + Inflection Point ($$$$) A Set of New Technologies  Bitmapped Displays  10 MB Ethernet The Market for Workstations  Powerful µprocessors  Open source UNIXJanuary 17, 2012 3 3
  4. 4. What are the big questions and issues (IMHO)Data: Scale up vs. Scale out vs. HybridShared kernel… (or not)? (Shared Memory and/or Message Passing?)Just what is the role for flash? Memory, SSD, other?Can we innovate without disrupting? (HW moves faster than software)How far can we push scalable coherent shared memory?System Design: simplify complex problems and connect the dotsJanuary 17, 2012 4 4
  5. 5. Is “the Cloud” an inflection point? Emergence of Cloud indicates that Emerging Needs we have been missing something MOBILITY CLOUD BIG DATA NEW UI GAP? ELASTIC SOCIAL CURRENT SYSTEMS NEW BUSINESS Current systems represent the work of the last two decades DriversJanuary 17, 2012 5 5
  6. 6. Cloud computing has emerged as a viable mode for theenterprise  Ease of access  Elastic  Mobility and sensors  More pervasive data storage  Computing as a utility, devices as appliances  Expandable storage  New business models are emerging (SaaS, PaaS, IaaS)January 17, 2012 6 6
  7. 7. Today I’ll talk a bit about  Current commodity server design  How this creates a computing gap for the enterprise *  How businesses of tomorrow will function  And how enterprise supercomputers will support them  A different layer of cloud Cloud Computing Cloud Computing *A Closer Look A Gap Current Systems Current Systems The Perception My RealityJanuary 17, 2012 7 7
  8. 8. The short story… Virtual Machine 1 Virtual Machine 2 Virtual Machine 3 … Virtual Machine N – Current View Physical Machine Physical Machine 1 Physical Machine 2 Physical Machine 3 … Physical Machine M – Emerging View Virtual MachineJanuary 17, 2012 8 8
  9. 9. Enterprise HW Computing Landscape – And what has emerged– The Chip – The Board BladeServer – The CloudJanuary 17, 2012 9 9
  10. 10. And this creates a perceived technology gapProblems with today’s servers: Cores and memory scale in lock step Access latencies to large memories from This creates a technology modern cores conflict: Addressing those memories Cores and memory need to scale more independentlyCurrent and emerging business needs: rather than in lock step Low latency decision support and analytics “What if?” simulation How can we resolve this conflict? Market/Sentiment analysis Pricing and inventory analysis In other words, we need “real” real time January 17, 2012 10 10
  11. 11. Flattening the layers We need to innovate without disrupting customers. (Like your multicore laptops.) Seamless platform transitions are difficult, but doable (Digital, Apple, Intel, Microsoft) Large memory technology is poised to take off, but it needs appropriate hardware Early results in High Performance Computing (HPC)* have yielded results that can now be applied to Enterprise Software. But today it’s different: We have a lot of cores per chip now and better interconnects We can “flatten” the layers and simplify We can build systems that improve our software now without making any major modifications Nail Soup argument: But, if we are willing to modify our software, we can win bigger. But we do this on our own schedule. (“Stone Soup” on Wikipedia) *Scientific/Technical ComputingJanuary 17, 2012 11 11
  12. 12. The Chip How do we leverage advances in:  Processor Architecture  Multiprocessing (Cores and threads)  NUMA (e.g. QPI, Hypertransport)  Connecting to the board (sockets) – The Chip  Core to memory ratio We have 64 bits of address, but can we reach those addresses?January 17, 2012 12 12
  13. 13. Typical Boards Highlights:  Sockets (2, 4, 8)  Speed versus Access How do we make these choices for the enterprise? – An 8-socket board layoutJanuary 17, 2012 13 13
  14. 14. Storage Considerations for storage  Cost per Terabyte (for the data explosion)  Speed of access (for real time access)  Storage network (for global business)  Disk, Flash, DRAM  How can applications be optimized for Example: A 2TB OCZ Card storage?January 17, 2012 14 14
  15. 15. Memory ConsiderationsHighlights Today, cores and memory currently scale in lock step Large memory computing (to handle modern business needs) Collapse the multi-tiered architecture levels NUMA support is built in to Linux Physical limits (board design, connections, etc) Memory RiserJanuary 17, 2012 15 15
  16. 16. The “Magic” CableJanuary 17, 2012 16 16
  17. 17. Reincarnating the “mainframe”• In-memory computing provides the real time performance we’re looking for• In-memory computing requires more memory per core in a symmetric way• I remember what Dave Cutler said about the VAX: • “There is nothing that helps virtual memory like real memory” • Ike’s corollary: “There is nothing that helps an in-memory database like real memory”• The speed of business should drive the need for more flexible server architectures• Current server architectures are limiting, and we must change the architecture and do so inexpensively• And how do we do this?January 17, 2012 17 17
  18. 18. Taking In-Memory Computing SeriouslyBasic Assumptions and observations Hard disks are for archives only. All active data must be in DRAM memory. Good caching and data locality are essential. – Otherwise CPUs are stalled due to too many cache misses There can be many levels of cachesProblems and Opportunities for In-Memory Computing Addressable DRAM per box is limited due to processor physical address pins. – But we need to scale memory independently from physical boxes Scaling Architecture – Arbitrary scaling of the amount of data stored in DRAM – Arbitrary and independent scaling of the number of active users and associated computing load DRAM access times have been the limiting factor for remote communications – Adjust the architecture to DRAM latencies (~100 ns?) – InterProcess Communication is slow and hard to program (latencies are in the area of ~0.5-1ms )We can do better. January 17, 2012 18 18
  19. 19. Putting it together – The “Big Iron” InitiativeJanuary 17, 2012 19 19
  20. 20. What is “Big Iron” Technology Today? Big Iron-1 Big Iron-2 Extreme Performance, Low Cost Extreme Performance, Scalability, and much simpler system model Traditional Cluster Enterprise Supercomputer Developers can treat as a single large Large shared physical coherent server memory across nodes 5x 4U Nodes (Intel XEON x7560 2.26Ghz)10 x 4U Nodes (Intel XEON x7560 2.26Ghz)  160 cores (320 Hyper-threads), 32 cores per node 320 cores (640 Hyper-threads), 32 cores per node  5 TB memory total, 30TB solid state disk 5TB memory (10TB max), 20TB solid state disk  160 GB InfiniBand interconnect per node  1 NAS (72TB Expandable to 180)System architecture: SAP Scalable coherent shared memory (via ScaleMP) System architecture: SAPJanuary 17, 2012 20 20
  21. 21. The Next Generation Servers a.k.a. “Big Iron” are a reality Big Iron -1 Big Iron -2 Fully Operational Fully Operational September 30, 2010 2:30pm February 4, 2011 All 10 Big Iron-1 nodes up and running All 5 Big Iron-2 nodes up and runningJanuary 17, 2012 21 21
  22. 22. At a Modern Data Center These bad boys are huge backup Special Pod way generators Cooling down has won there for awards Big IronJanuary 17, 2012 22 22
  23. 23. Progress, Sapphire and others: Sapphire Type 1 Machine (Senior) Sapphire Type 1 Machine (Junior) • 32 Nodes (5 racks, 2.5 tons) • 1 node (1 box, on stage) • 1024 Cores • 80 cores • 16 TB of Memory • 1 TB of Memory • 64 TB of Solid State Storage • 2 TB of SSD • 108 TB of Network Attached Storage • No Attached StorageJanuary 17, 2012 23 23
  24. 24. Coherent Shared Memory –A Much Simpler Alternative to Remote CommunicationUses high-speed, low latency networks(Optical fiber/copper with 40Gb/s or above) Typical latencies of this are in the area of 1-5 μsec Aggregated throughput matches CPU consumption L4 cache needed to balance the longer latency on non-local access (cache-coherent non-uniform memory access over different physical machines)Separate the data transport and cache layers into a separate tier below theoperating system• (almost) never seen by the application or the operating system!Applications and database code can just reference data The data is just “there”, i.e. it’s a load/store architecture, not network datagrams Application level caches are possibly not necessary – the system does this for you (?) Streaming of query results is simplified, L4 cache schedules the read operations for you. Communication is much lighter weight. Data is accessed directly and thread calls are simple and fast (higher quality by less code) Application designers do not confront communications protocol design issues Parallelization of analytics and combining simulation with data are far simpler, enabling powerful new business capabilities of mixed analytics and decision support at scale January 17, 2012 24 24
  25. 25. Software implications of Coherent Shared MemoryOn Application Servers Access to database can be “by reference” No caches on application server side. Application can refer to database query results including metadata, master data etc. within the database process. Caches are handled by “hardware” and are guaranteed to be coherent. Lean and fast application server for in-memory computingOn Database Code A physically distributed database can approach DRAM-like latencies Database programmers can focus on database problems Data replication and sharding are handled by touching the data and L4 cache does the automatic distributionIn fact, do we need to separate application servers from database servers at all?No lock-in to fixed machine configurations or clock ratesNo need to make app-level design tradeoffs between communications andmemory accessOne O/S instance, one security model, easier to manage January 17, 2012 25 25
  26. 26. But what about…Distributed Transactions? • We don’t need no stinkin’ distributed transactions!What about traditional relational databases? • In the future, databases become data structures!(Do I really believe this? Well, not really. Just wanted to make the point.)Why not build a mainframe?  From a SW perspective, it is a mainframe (with superior price/performance for in-memory computing)Is it Virtualization? • In traditional virtualization, you take multiple virtual machines and multiplex them onto the same physical hardware. We’re taking physical hardware instances and running them on a single virtual instance. January 17, 2012 26 26
  27. 27. A simplified programming model and landscape Virtual Machine 1 Virtual Machine 2 Virtual Machine 3 … Virtual Machine N – Current View Physical Machine Physical Machine 1 Physical Machine 2 Physical Machine 3 … Physical Machine M – Emerging View Virtual Machine (Single OS instance)January 17, 2012 27 27
  28. 28. The business benefits of coherent shared memoryImproved in-memory computing performance at dramatically lower cost The ability to build high performance “mainframe-like” computing systems with commodity cluster server components Ability to scale memory capacity in a more in-memory computing-friendly fashion Simplified software system landscape using system architecture that can be made invisible to application softwareMinimize changes to applications Enables applications to scale seamlessly without changes to the application code or additional programming effort Traditional cluster based parallel processing requires specialized programming skills held only by top developers; this will constrain an organization’s ability to scale development for in-memory computing With coherent shared memory, the bulk of developers can develop as they do today and let the underlying hardware and lower level software handle some of the resource allocation, unlike today. But, existing, standard NUMA capabilities in Linux remain available. January 17, 2012 28 28
  29. 29. Big Iron Research  Next generation server architectures for in-memory computing Processor Architectures Interconnect technology Systems software for coherent memory OS considerations Storage  Fast data load utilities  Reliability and Failover  Performance engineering for enterprise computing loads  Price/performance analysis  Private and public cloud approaches  And moreJanuary 17, 2012 29 29
  30. 30. Some experiments that we have run  Industry standard system benchmarks  Evaluation of shared memory program scalability  Scalability of “Big Data” analytics – just beginningJanuary 17, 2012 30 30
  31. 31. Industry Standard BenchmarksSTREAM: specJbb: large memory vector processing  Highly concurrent Java transactions BigIron would be 12th highest recorded test Nearly linear in system size  BigIron would be 13th highest recorded test Measures scalability of memory access  More computational than STREAMJanuary 17, 2012 31 31
  32. 32. To sum up  Cloud computing has emerged  The businesses of tomorrow will function in real time. Really!  There is a growing computing technology gap  There is an unprecedented wave of advances in HW  The “mainframe” has been reincarnated with it’s attendant benefits, but can be made from commodity parts– Enterprise supercomputers leveraging coherent shared memory principles and the wave of SW+HW advances will enable this new realityJanuary 17, 2012 32 32
  33. 33. Questions?Contact information:Ike NassiEmail: