Optimized Systems: Matching technologies for business success.
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Optimized Systems: Matching technologies for business success.

  • 1,948 views
Uploaded on

Tom Rosamilia, General Manager, Power and z Systems, IBM Corporation outlines the way business can optimize it's systems to enhance performance, reduce cost per workload and drive innovation.......

Tom Rosamilia, General Manager, Power and z Systems, IBM Corporation outlines the way business can optimize it's systems to enhance performance, reduce cost per workload and drive innovation. Presented at the Smarter Computing Executive Summit, 25th May 2011.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,948
On Slideshare
1,940
From Embeds
8
Number of Embeds
1

Actions

Shares
Downloads
18
Comments
0
Likes
1

Embeds 8

http://smartercomputing.tumblr.com 8

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • <Refer back to Rhodin and that you will cover Optimized Systems.>
  • <This is a Smarter Planet example and the background to this presentation. It’s to prime the conversation and show that new workloads are coming up because of instrumentation, intelligence and interconnectedness. This example is not specific to Power and z.> This example feature Garden State Parkway. Raritan Toll Plaza to Exit 145/I-280 Southbound Comprising 30 links on the Parkway New Jersey Turnpike - I-95 Northbound and Southbound Comprising 65 links on the Turnpike Smart Monitoring: Multimodal Data Streams - GPS - Counts, speeds, travel times - Public Transport - Pollution measurements - Weather Conditions Smart Analytics: Real Time Traffic Monitoring Real Time Traffic Information (Multimodal) Travel Planner STOCKHOLM By the end of the trial, traffic was down nearly 25 percent. Public transport schedules had to be redesigned because of the increase in speed from reduced congestion. And even inner-city retailers saw a six percent boost in business . But the benefits go beyond fewer cars: During the spring of 2006, 40,000 more travelers used Stockholm transport on an ordinary weekday than the year before—an increase of six percent. The reduction in traffic during the Stockholm Trial has led to a drop in emissions from road traffic by eight to 14 percent in the inner-city. Greenhouse gasses such as carbon dioxide have fallen by 40 percent in the inner-city and by two to three percent in Stockholm County. ABOUT THE SYSTEMS in STOCKHOLM The solution is based on SAP software running under IBM AIX on the IBM System p platform. Two p690 servers are partitioned into 20 logical partitions to handle the SAP financial and CRM applications from SAP, as well as SAP NetWeaver Business Intelligence (SAP NetWeaver BI) and various IBM DB2 databases. Further processing power is provided by six IBM BladeCenter systems containing more than 60 IBM HS20 blade servers, powered by Intel Xeon EM64T processors, as well as 10 Intel-based IBM System x servers. These Intel platforms run WebSphere Application Server under Linux, supporting a Web portal, and Citrix under Windows, which provides desktop applications for the call centre. As Web and call centre traffic increases, it will be simple to add processing capacity, simply by increasing the number of hot-pluggable blade servers in each of the BladeCenter chassis. ADDITIONAL BACKGROUND The Swedish National Road Administration and the Stockholm City Council selected IBM to develop a traffic-charging system – combined with park-and-ride services and improved transit options. Tolls vary in price by time and day to influence traffic patterns and congestion levels. Eighteen roadside control points located at Stockholm city entrances and exits were set up to identify and charge vehicles—higher fees during peak times; lower during off-peak hours. Transponder tags installed in vehicles communicated with receivers at the control points and triggered automatic payment. Cars passing through these control points were photographed, and the license plate numbers were used to identify those vehicles without tags and to provide evidence to support the enforcement of non-payers. The information was sent to a computer system that matched the vehicle with its registration data, and a fee was charged to the owner. STOCKHOLM SOURCE: http://www.ibm.com/ibm100/us/en/icons/transportationflow/
  • <reference back to Rhodin’s deck> To generate new insights from the exploding volume, velocity and variety of data, IT departments can leverage systems that are specifically architected for that task. These workloads can place tremendous demands on the systems used to process the transactions and complete the computing tasks. To maximize performance and efficiency, systems can be optimized at every layer of the technology stack to exploit unique processor, memory and storage characteristics. An application for managing resource planning in a factory requires different computing capabilities than one used to analyze the usage patterns and turnover rate of mobile phone customers. The workloads generated by a smarter planet require optimized systems that leverage: domain knowledge to understanding specific workload characteristics hardware that leverages multi-core architectures and advanced threading and software that is tuned from operating system through the middleware stack. In an optimized system, every layer is designed to yield improved overall performance: (bottom up) Semiconductor Technology Ability for memory scaling and caching Microprocessor Design Intelligent thread control based on workload needs Systems Design Enterprise Reliability, Availability & Serviceability Virtualization & Operating Systems Scalable, granular, low overhead virtualization Compilers & Java Virtual Machine Compiler and JVM optimized to exploit hardware architecture Optimized Middleware Middleware optimized to exploit hardware architecture
  • Here are some of the key things you need to consider as you think about leveraging Optimized Systems Architecture: C reate a strong architectural foundation to address business needs - Are you optimizing the deployment of different applications on your infrastructure? - Is your middleware exploiting your hardware to give you optimal performance - Is your infrastructure appropriately supporting your security, reliability and availability needs - Do you have opportunities to leverage your data to develop business insights ? Does this create additional workloads for your infrastructure? - Do you have opportunities to consolidate and virtualize your infrastructure leading to cloud based delivery models? Economics: Leveraging Optimized Systems helps you optimize your total cost of ownership - Where are your current issues relating to IT economics? - What components of IT costs are you most concerned about ? - Energy costs, Facility costs, IT labor cost, HW/SW licensing cost, Networking costs Performance: Leveraging Optimized Systems helps you improve your IT performance - Is your IT performance aligned with your business needs? - Do you manage all your IT resources (servers, storage, networks) using an integrated approach ?
  • IBM offers different approaches to optimize workloads—from multi-workload to single workload systems. There are 3 types of Optimized Systems 1. Optimized by Design eg. WebSphere on Power 7 ( The point is that the client builds them using optimized components (from IBM), e.g. DB2 with POWER. The integrated optimized systems can and are multi-purpose too.) 2. Flexible Integrated – eg. IBM Smart Analytics system 3. Tightly integrated appliances – eg. Netezza Twin fin Now let us discuss examples of optimized systems in more detail – in particular focusing on key differentiators and how they provide economic value. An example of a client-built with optimized components is WebSphere on Power 7. Power is a versatile multi-workload system (including analysis) capable of running a wide variety of workloads. The point is that you can pick your level of sophistication, power, flexibility, and function depending upon your intended purpose, and, as expected, you have greater investment for greater return. The WebSphere portfolio has been tuned for optimization on Power 7. All products that run on the IBM JVM, or WAS, benefit from our explicit performance tuning of Java on P7 (eg, WebSphere eXtreme Scale, WebSphere ESB, WebSphere Business Monitor, WebSphere Business Events, etc). Some examples of Java 6 improvements on P7: • Large pages • Mutli-core scalability • Hardware exploitation: floating decimal point, cache prefetching, customized instruction scheduling The net result is improved performance for products leveraging IBM JDK on Power7 … For example, eg. WebSphere JRules on Power7 outperforms JRules on comparable Intel processors by 30 - 60% thanks to optimizations in data and XML processing. Another example is WebSphere Application Server - WAS 7.0.0.9 running java 6 SR7 is optimized for Power 7, delivering >50% improvement in performance. An example of an integrated, optimized system is IBM Smart Analytics System 7600. Key features include: • Pre-packaged - with optimal IO subsystem, large IO bandwidth and modular storage scaling for analytics • Leverage DB2 and Cognos capabilities -tuned for Power Systems, System z and System x platforms • Delivers superior economics by… … enabling you to start with the right size and scale … deploying in days rather than months … utilizing data compression technology to reduce storage costs 30% compared to Oracle An example of an Appliance is IBM Netezza TwinFin Netezza Twinfin appliances integrates database, server and storage delivering superior price/performance and is tailored for data warehouse and analysis at a cost per function ratio unmatched in the industry. Key attributes of Netezza include: • Designed for rapid analysis of data scaling to petabytes • Delivers 10x-100x performance improvements at a third of the cost of options available from traditional database vendors • Platform integrates business intelligence and advanced analytics • Fast time to value – Meaningful POC in 2 weeks Netezza offers performance out-of-the-box, without requiring any tuning, indexing, aggregations, etc. TwinFin’s integration with other products such as Cognos and Tivoli means data assets are under management for their entire lifecycle.
  • IBM’s Optimized Systems can help drive down cost per workload and improve time to value. Leading enterprises are evolving from deploying systems as fast as possible to matching the systems architecture to the specific workload requirements. System z is delivering unique hybrid technology with the versatility of multiple architectures combined with the management and control of a single system The System z value proposition plays into the patterns of Smarter Computing by providing clients with the freedom to innovate, the freedom to save and the freedom to protect: zEnterprise brings the best of both worlds: Lowest cost of Acquisition per workload and lowest cost of operation per workload Integrates through a centrally managed system to address the inefficiencies of today’s heterogeneous infrastructures Provides the freedom to choose the best platform for deploying each workload Embraces multiple technology platforms, mainframe, UNIX and x86 to maximize efficiency and transform IT Economics The Power Systems brand identity – Performance Redefined – integrates and connects to the three themes of Smarter Computing, telling our story through client projects. Integrate: Clients want to consolidate and integrate large scale workloads by deploying UNIX database apps on POWER7 with PowerVM and IBM Migration Factory services. They will see potential savings of 70% lower costs with DB2 on Power versus Oracle/Sun Automate: Clients want to accelerate deployment with automated service management by implementing self-service provisioning with PowerVM and IBM Systems Director with services from IBM Business Partners. They want to reduce deployment times by up to 67%, based on using PowerVM and Tivoli Service Automation Manager to automatically deploy standardized services to IT customers via a self-service interface, without administrator intervention. Secure: Clients want to move x86 applications to Power to reduce business vulnerability, by exploiting PowerHA resilience, virtualizing with PowerVM and deploying with assistance from IBM Lab Services. They want to benefit from zero reported security vulnerabilities for PowerVM, according to the US National Vulnerability Database, compared to 119 reported on VMware. System x - Industry-leading Intel Xeon x86 performance delivering lower cost per processing capability Utilize increased processing cores, higher local memory capacity and faster, reliable storage, to increase application utilization for a lower cost per workload. Enterprise X-Architecture allows System x and BladeCenter systems to boost performance and minimize licensing costs by de-coupling memory from the processor. This allows businesses to add memory as needed without necessarily adding servers. IBM eX5 enterprise systems deliver extraordinary value and investment protection for enterprise-level virtualization, database, and transaction processing. MAX5 – Unprecedented memory expansion with the external MAX5 memory chassis, decoupling server memory from system processors to allow you to optimize your server performance eXFlash – Get dramatically faster I/O, with greater storage density and improved reliability FlexNode – Gain the ability to re-deploy your server on a project-by-project basis for superior asset utilization and workload management Scalability – Updating IBM’s innovative ‘pay-as-you-grow’ capability by enabling all eX5 servers to scale Integrate – The ability to integrate into an x86 infrastructure while providing enhanced performance enabling consolidation & virtualization helps customers dramatically improve the ROI on their IT investment - support for the ecosystem defined by Windows, Linux, VMware, and Intel as well as the unique IBM designs like MAX5 and eXFlash sets IBM System x designs ahead of the competition. Automate – System x is designed from the ground up with proactive management capabilities which directly address the high cost of normal x86 server ownership/management. Enabling System x servers to be automated, freeing up valuable time/resources within the customer environment. Secure – System x leadership capabilities in supporting dense virtualization helping eliminate server sprawl and lowering vulnerable access points to the customers’ IT infrastructure, supporting HA features which provide a reliable foundation for virtualization and critical workloads, and supporting partitioning and clustering which can deliver higher availability for those workloads which business critical. The Storage Systems brand – integrates and connects to the three themes of Smarter Computing, telling our story through client projects. Companies can have a more efficient storage infrastructure by storage less data, moving data to the right type of storage, and better utilizing existing storage assets. Integrate: Clients want to virtualize their environment so the don’t have to manage heterogeneous islands of storage. Today their disk utilization is less than 50% and their administrators are overwhelmed. They need to increase their disk utilization to 70% or more and free up storage administrators to from basic management tasks to more strategic revenue producing projects. Automate: Clients want to automate processes to focus on more strategic activities. Today they are spending too much time moving data from place to place to address performance issues and manage cost. Easy Tier can increase performance by 3 times with only 2% of data on solid state storage. SONAS and Information Archive allow policy based automatic movement of data to move data to lower cost tiers of storage without administrator involvement. Secure: Data is the life blood of business. Data must be protected against any loss of data whether it’s disaster recovery, data corruption, or even theft. ProtecTIER virtual tape offerings with deduplication offerings allow business quickly backup data to restore data in minutes. Disk and tape encryption keep data secure if it falls into the wrong hands. Encryption keys have to be managed efficiently which is done by Tivoli Key Lifecycle Manager.
  • Anatomy of an Optimized System: POWER7 The components of the system are illustrated as components of an upward-facing arrow, indicating that they build on each other to deliver the performance to customer applications. The relationship between these components is not, of course, uni-directional. Deep collaboration happens across the stack to provide insight into optimization opportunities and to drive design from workload characteristics. In general, the optimizations presented were introduced either specifically for Power 7 or were provided in software products available at the time of system launch. Some optimizations were unique to Power 7 while others were more platform neutral but provided an especially large benefit for Power 7. Optimizations for parallelism and memory affinity generally fall into the latter category. Semiconductor technology notes: The Power 7 chip is based on IBM 45nm SOI technology. Compared to 65nm technology that was the basis for Power 6, 45nm SOI provides approximately twice the transistor density and 28% greater switching speed. This technology also introduces embedded DRAM which provides for 4x greater density for large arrays (e.g. cache) than is possible within a similar energy envelope using SRAM technology.
  • Power 7 microprocessor notes: Compared to Power 6, Power 7 makes very different trade-offs in design, leveraging the 45nm SOI technology to deliver boosts in performance in all dimensions including single thread performance. While the clock frequency is lower in Power 7 compared to Power 6, innovations in the core design and the availability of much faster caches allows code to execute faster. This is particularly true for floating point operations where additional resources within the core and redesigned pipelines provide a big boost for codes which are floating point heavy such as HPC or analytics. Power 7 really shines on parallel code however. Compared to Power 6, Power 7 has 4x the number of cores per chip and 2x the number of supportable SMT threads per core. Such parallelism must be harnessed by software of course but in real cases (e.g. with the Bloomberg ticker application), 7x performance can be achieved relative to Power 6.
  • Power 7 system notes: The Power 7 processor is socket compatible with Power 6, allowing an upgrade path and preservation of investment in Power 6 system structures. Due to the increased parallelism at a chip level, however, Power 7 systems can scale all the way up to 256 cores and 1,024 threads which can either be allocated to a single LPAR for very demanding scale-up workloads or can be used to achieve massive application consolidation. Memory capacity scales to 8TB on the largest systems and is connected to the Power 7 processors through a high bandwidth interface. Reliability of these systems is exceptionally high, due to combination of hardware and software engineering, resulting in almost 5 nines of availability. Comparing to the competition, this results in less than half the expected downtime than Solaris or HP-UX systems and less than one fifth the expected downtime of x86 systems. High availability and efficiency combine to allow these systems to be run reliably at utilization rates as high as 90%.
  • Virtualization & operating systems notes: The Power hypervisor (aka PowerVM) is the foundation of system software support, drawing on extensive virtualization support in the hardware. Virtualized execution is intrinsic to the platform and incurs a minimal (<2%) runtime performance penalty on most code. PowerVM scales to the largest 256 core, 8TB RAM system. An LPAR can be allocated the entire 256 core system or as little as 1/10 of a core. Over 1,000 LPARs can be supported in the largest Power 7 systems. Memory is actively shared across guest operating systems, providing greater effective real memory by taking advantage of differential workload across the guests. Memory can also be selectively compressed on AIX guests to take advantage of natural locality of reference in many workloads. In such cases, the effective memory can be expanded by 2x or more, trading off CPU utilization but with little negative observable performance impact. AIX provides dynamic management of the simultaneous multi-threading on Power 7, boosting single thread performance by up to 15% at low to moderate system utilization by configuring cores in their fastest non-shared mode. AIX also manages memory affinity carefully to best exploit local memory references while minimizing references to slower, remote memory. Such affinity optimization results in 10% improvement in OLTP workloads. The system software layers also actively manage the power consumed by the system by adjusting the frequency across the entire range of 40% to 110% of nominal, according to varying workload.
  • Compilers and Java Virtual Machine: Since the creation of the first Power processor in the early 1990s, the compiler technology has been co-developed and co-optimized. With every new chip, the compilers are extensively customized to get the very best performance possible on a range of workloads. Power 7 is no exception, with single thread performance boosts observed to be up to 50% in some cases. On OLTP and ERP workloads, the compilers deliver between 10 and 15% overall performance boosts. On HPC and analytics codes, the compiler effect can be much more dramatic, boosting performance by up to 50% in real, complex applications. Like the compilers, the Java virtual machine, which is 100% IBM developed, is heavily customized to each new chip and system. In the case of Power 7, the “just in time” compiler delivered between 10-15% on a broad set of workloads and more spectacular results on selected workloads such as the 9-10x boost in performance for RSA cryptography operations. In addition, the Java virtual machine itself has adapted to the Power 7 system, delivering large improvements which optimize for memory affinity and a remarkable 15-25% on Java code which runs in 64-bit mode.
  • Middleware: IBM and IBM partners produce a broad range of middleware platforms. Two of the most important for IBM are the WebSphere Application Server and the DB2 database. Much of the rest of both IBM’s portfolio of middleware and our partners’ middleware is based at least in part on one or both of these and rely to a significant degree on them to deliver parallel performance and efficiency. For Power 7, WebSphere was aggressively optimized on many fronts, building on the big boosts in performance delivered by the Java virtual machine. Database connection pools have 4x more parallelism, caching code paths have 6x more parallelism and thread pools are 3x more effective at dispatching work. A distinct project known as “Mason” focused on the engineering of the full software stack to deliver massive scaling improvements, boosting peak single instance performance by 85%, utilizing 32 threads or an entire Power 7 chip. Beyond a single chip, WebSphere scales out by using multiple application server instances. In this scenario, Mason delivered up to 50% improvement on Power 7 systems. DB2 was also heavily optimized for Power 7, as it is for every chip and system generation. It scales up to the largest LPARs and now also scales out to take advantage of new clustering support available in AIX. DB2 also takes advantage of the protection keys built into the hardware and exposed by AIX to accelerate user code hosted in the database (UDFs) without sacrificing the integrity of the database in the event that user code fails. Benchmark details (all compare Power 570 P6 systems with Power 780 P7 systems): -----------------Power 6------------------- -----------------Power 7------------------ Result Cores Sockets Result Cores Sockets SPECint_rate2006 478 16 8 652 16 2 SPECfp_rate2006 426 16 8 586 16 2 SPECjbb2005 (jops) 867,989 16 8 1,331,641 16 2 SAP SD 2-tier (Users) 14,432 32 16 37,000 64 8
  • What is the result of this technology optimization? Here are some proof points that demonstrate how IBM Software integrated with and optimized on Power Systems delivers greater value. (3.9X more SAP users per core) Quite often our customer don’t pay credence to industry benchmarks like TPC-C or TPC-H because their workloads don’t look like theirs. Let’s look at SAP 2-tier benchmarks since many of our customers are running SAP. In this benchmark, a 64-core Power 780 running DB2 can support 37000 users. 32-core SPARC T5440 running Oracle DB only supports 4720 users. On a per-core basis, Power running DB2 supports 578 users compared to 148 users on SPARC T2+. That’s almost 4 times more users on Power. 4X more throughput at 1/4th the cost per report of Exadata Test conducted using IBM Smart Analytics 7700 XS (extra small) configuration vs. Exadata Quarter-rack running the CPO BI-Day benchmark. This benchmark is a Business Intelligence workload generated by Cognos and consisting of multiple different report types (complex – hours to run, intermediate – minutes to run, and simple – measured in seconds). Tests run using multiple users simultaneously submitting report requests and balanced according to ratios specified in a Forrester Report on typical BI workload composition. Both configurations used 16 cores on their database tiers. Exadata applied an extra 24 cores of processing power via the Exadata Storage Nodes which include their costly Storage Software. For systems with comparable software, comparable database cores, comparable storage (# drives, amount SSD) and nearly identical cost, the IBM system was able to process 4x as many reports as Exadata. This results in a net cost per result that is ¼ the cost of running reports on Exadata. • IBM Smart Analytics provides better throughput for all categories of reports in aggregate • Better throughput and competitive cost provide better cost per report • 4x less expensive per report ISAS 7700 10TB Concurrent Timed Test Run 108 Competitor 10TB, TEST S/N: 022311_2156, TEST Description:10TB Concurrent Throughput, undo_retention=345600 (54% lower cost per workload) High-end Power Servers are a great platform for virtualization and consolidation. In internal SWG study, we found that 99 8-core SPARC T3-1B blades – running 50-50 WAS and DB2 workloads with a random load pattern can be consolidated into a single 256-core Power 795. Calculating 3-year TCA, which include hardware and software cose and 3 years of 24x7 maintenance, we found that cost per workload is $ 146 for Power 795 vs $316K for SPARC T3-1b. That’s a 54% saving. (2.5X more performance) This is proof point showing how WebSphere Application Server is tuned for Power. In an internal SWG benchmark test, we compared an online transaction processing (OLTP) workload for a banking application on 12-core Power 730 vs 12-core Oracle Sunfire x4170 (based on Intel Xeon 5500 processor) running Weblogic application server. Both systems were tuned for best performance. Power server achieved 10415 transactions per second (tps) compared to 4035 tps. That’s 2.5 times better performance on Power. (1/4th the cost) We did the same comparison against 16-core SPARC T3-1B blade running Weblogic Appserver. T3 delivered just 1901 transactions per second (TPS). When we priced it for 3 year TCA, Power came out $33 per TPS while T3 came out $134 TPS. That a difference of 1:4. (12.8X improvement) “ When comparing it against other Enterprise Service Bus solutions, there’s no contest; 8-core Power 750 running WebSphere Message Broker demonstrates 12.8 times more throughput at 76% lower cost per message than Biztalk running on a 32-core Dell 910 server!
  • <POINT TO MAKE IS PER WORKLOAD – NOT per core – change the conversation!> Now let’s look at how that integration, and performance optimization translates into superior economics with Power Systems. Consider 3 deployment choices for a heavy UNIX online transaction banking workload -- compare 3 different server and operating environments. You can see that the workload optimization between POWER7 and WebSphere Network Deployment Edition, plus the virtualization benefits of PowerVM, deliver a strong cost advantage for running these workloads on Power 795. In this scenario, we tested a very heavy banking workload (1075 transactions per sec) that fits in a 16-core Oracle SPARC T3-1B blade. The CPU utilization pattern of this workload is very random and peak utilization is 2.5 times the mean. To scale up on SPARC blades to run 99 such workloads, we would need 99 T3-1B blades. We also looked at running the same workload on Oracle’s 64-core SPARC T3-4 servers running OracleVM for SPARC as the virtualization hypervisor. We expected to get some consolidation benefit. But the response time of the OracleVM hypervisor is so poor that it could not take advantage of combining these randomly varying workloads. Essentially, there was no consolidation benefit. As a result, we ended up requiring the same number of cores as the T3-1B blades (rounded up to the next 64-core), or 25 SPARC T3-4 servers. Based on internal SWG studies, however, this workload can be consolidated efficiently into just one 256-core Power 795 running AIX and PowerVM. That’s a consolidation ratio of 6.2 : 1. We priced the Power 795 and & IBM software, including WebSphere Network Deployment Edition, with 3-yrs of support to get the Total Cost of Acquisition (TCA). We divided that number by 99 to arrive at cost-per-workload. The cost for Power 795 was 54% less than Oracle’s T3-4 servers.
  • Now let me share another optimized system. The mainframe. The IBM mainframe is the ultimate example of a consolidated system. Why fragment computing onto thousands of footprints. It’s the ultimate in consolidation. That’s because of our hybrid computing capabilities – including Linux and Windows on System x later this year. zEnterprise is a true fit for purpose system. To take full advantage of emerging business opportunities, clients choose System z servers and software over x86 systems so they can: Improve service delivery and enable innovation by creating an enterprise cloud through deep integration of IBM zEnterprise and IBM middleware Deliver superior economics to the business by consolidating workloads and collapsing infrastructures with IiBM zEnterprise and Linux on z Create actionable insight within the transaction by integrating real-time data modeling and operational data via IBM zEnterprise and IBM Middleware
  • This example shows exactly what a hybrid environment can do architecturally for clients. It simplifies, it integrates. And it reduces costs – in this case, up to 86 percent less cost. RELATED TO DEUTSCHE BANK POC 1) Competitive (Oracle RAC) infrastructure doesn't meet most clients' mission-critical business requirements 2) However, our competitors will say they are simpler/less complex and cheaper. The reality is that they are not simpler. Many vendors offerings need to be tied together to come close to an apples to apples comparison, and then it's apparent that they aren't cheaper. For example, one bank saw a 13 percent TCO savings even with ORACLE SW costs factored out ( to take into account ULA). < WITH the oracle costs in there it was over 50% savings. ( over 5 years) > [Additional information] Clients often deploy applications for better business outcomes ... does the infrastructure selected meet all of the clients business requirements on scalability, performance, availability , disaster recovery, operability and transactional integrity etc...? Most clients lay out these requirements for any mission-critical application, such as the core banking one tested in the POC. The answer in many cases, is that Oracle RAC cannot meet all of the requirements -- and, in this case, Oracle failed 3, and likely 4 out of the 6 requirements. (SIDENOTE: Disaster recovery could not be tested apples to apples.) The Oracle Rac environment LOST transactions ( can you imagine that in a banking environment?). It couldn't complete the nightly bank postings ( account balancing, calculation of interest and rates, etc) in the time alloted, and further, it couldn't demonstrate continuous availability and meet the availability times. And this is just a POC of one workload. Testing a mixed workload environment, which is real world, is much more complicated and resource intensive ... (imagine what that would do to an x86-Oracle environment) The client response in this POC was -- initially -- "since Oracle and x86 are simpler and cheaper, we can mitigate all those problems and make it work, right?" Wrong. We shared with them the true complexity of their chosen solution -- and then share a very simplified example of an SAP deployment (2 vendors in the IBM environment vs 9 vendors in the x86 environment, including more services and labor.) Then they said, "OK, so maybe it's not simpler, but it's certainly cheaper, right?" Wrong. Using our hybrid innovation and best fit for optimization, we actually have lower TCO WITH or WITHOUT the Oracle SW costs factored in. [Tom: this is the DB TCO work.] And this is just looking at one workload. Background on the benchmark SAP provides a standardized SAP banking benchmark scenario, called "TRBK - transaction banking": 1) day workload: mostly mass postings 2) night workload:` account balancing, calculation of interests and rates
  • <Quick look at entry points to optimizing workloads – to set up client charts for following 3 charts>
  • Slide theme: BCBSM case study Important to mention – they invested in a POC – limited the risk, but did the decision to test it. Business Challenge: The Microsoft® Windows® and Intel® processor-based server landscape was inflexible and costly to operate and maintain. Five-year TCO Study: “ Even without factoring in the maintenance and support costs—which would be considerable for a large estate of physical servers—we found that running a virtualized Linux environment on System z would be somewhere between 30 and 50 percent less expensive than a distributed architecture,” says Ted Mansk. “Suddenly, the choice of infrastructure had become an easy decision.” Solution: IBM helped BCBSM consolidate 140 servers to a single IBM System z with six Integrated Facility for Linux (IFL) engines. Key applications now run in Linux virtual servers, while IBM DB2® databases run on z/OS® on the same physical machine. Benefits: Significant TCO reduction over five years Virtualization cuts server provisioning times by 99 percent Disaster recovery can be achieved 97 percent faster than before Details available at: http://www.ibm.com/software/success/cssdb.nsf/cs/ARBN-7Z6KV9?OpenDocument&Site=corp&cty=en_us
  • UMass: Education, Midmarket, NA IOT - Massachusetts, PS701 Express blades running RHEL, Source: CRDB (Dec 2010). The University of Massachusetts (UMass) Dartmouth Physics Department strives to maintain excellence in three major areas, research and publication, teaching at all curricular levels, and public outreach. The Physics Department had been using Intel Xeon processor-based servers. However, the department found that it was taking too long to run some of its critical calculations. The UMass Dartmouth Physics Department deployed an IBM BladeCenter S Chassis device holding two IBM BladeCenter PS701 Express servers running the Red Hat Enterprise Linux operating system. The BladeCenter PS710 Express servers have eight cores each at 3.0GHz for a total of 16 cores. The department then tested two separate applications on the BladeCenter PS710 Express servers and the Intel Xeon processor-based servers. The first application ran five times faster and the second application ran eight times faster. UMass Dartmouth Physics Department can do more in less time. In fact, calculations that used to take an entire month now take less than one week to run.
  • Accessing and analysing information held in its archives was taking so long that the NYSE struggled to get any value from it. In fact it was taking as long as 12 hours to load the data. Loading data into the system was also problematic. The organisation decided it had to have a new data warehouse , eventually opting for a solution from Netezza. Within weeks of signing a deal the system was up and running. According to Netezza's director of product marketing, Phil Francisco, The Performance Server can deliver 10 to 100 times the performance of traditional data warehouse systems thanks to an architecture that combines a relational database, server and storage with patented streaming technology. This technology puts processing power right at the source, next to the data , so it can be processed `on stream.' Clarke said, "I expected it to work well, but I was surprised at how easy it was to integrate the system and get it to work with our data model. ... If we had had to load the same amount of data into our old system from a well known software giant, it would have taken many, many months." The new system is so `data ready' that the company is finding it possible to use data in more useful ways. Data can now be used to perform a variety of functions that were highly problematic with the old system. These include tracking the value of a listed company, performing trends analysis and searching for evidence of fraudulent activity. With the old system , any sort of query would take upwards of six hours to fulfil , but with the Netezza Performance Server, the average time it takes to complete a request is 20 seconds. The new data warehouse has also had a huge impact on the NYSE's ability to comply with the numerous financial regulations. The data compression technique used in the firm's technology enables firms to manage their data in a much more efficient way, adding that it can de-compress data as fast as it can compress it. http://www.netezza.com/media/2008/itweek_051208.pdf
  • Now, I’d like to introduce you to one of our long-time clients, who is already on the smarter computing journey. David Wade Chief Information Officer Executive Vice President Primerica Financial Services BACKGROUND z is the BRAINS of their business warehouse. Primerica has been running DB2 since 1984 and do more things with DB2. They stretch it to the limits.. They get fast response with DB2 running on the Mainframe and have optimized DB2 for their workloads. WebSphere on z and Power. Tivoli Storage manager on Power and Omegamon on z. Cognos for x. Scalable and secured. Power interfaces with the Web and agents send document via mobile device to upload the insurance information. That feeds in the Power system through WebSphere and hits Mainframe through DB2 and ECM takes an image of it and then files it and agent gets paid. Hits WebSphere ECM takes and image and files it and pays the web They use Power and z to manage all their processes from front end to the back end. Cognos moved to z196 to increase agents’ productivity.
  • <Summary/wrap-up>
  • Key Attributes of an Optimized System: We will explain this using Power 7 as a concrete example. The components of the system are illustrated as components of an upward-facing arrow, indicating that they build on each other to deliver the performance to customer applications. The relationship between these components is not, of course, uni-directional. Deep collaboration happens across the stack to provide insight into optimization opportunities and to drive design from workload characteristics. In general, the optimizations presented were introduced either specifically for Power 7 or were provided in software products available at the time of system launch. Some optimizations were unique to Power 7 while others were more platform neutral but provided an especially large benefit for Power 7. Optimizations for parallelism and memory affinity generally fall into the latter category. Semiconductor technology notes: The Power 7 chip is based on IBM 45nm SOI technology. Compared to 65nm technology that was the basis for Power 6, 45nm SOI provides approximately twice the transistor density and 28% greater switching speed. This technology also introduces embedded DRAM which provides for 4x greater density for large arrays (e.g. cache) than is possible within a similar energy envelope using SRAM technology. Power 7 microprocessor notes: Compared to Power 6, Power 7 makes very different trade-offs in design, leveraging the 45nm SOI technology to deliver boosts in performance in all dimensions including single thread performance. While the clock frequency is lower in Power 7 compared to Power 6, innovations in the core design and the availability of much faster caches allows code to execute faster. This is particularly true for floating point operations where additional resources within the core and redesigned pipelines provide a big boost for codes which are floating point heavy such as HPC or analytics. Power 7 really shines on parallel code however. Compared to Power 6, Power 7 has 4x the number of cores per chip and 2x the number of supportable SMT threads per core. Such parallelism must be harnessed by software of course but in real cases (e.g. with the Bloomberg ticker application), 7x performance can be achieved relative to Power 6. Power 7 system notes: The Power 7 processor is socket compatible with Power 6, allowing an upgrade path and preservation of investment in Power 6 system structures. Due to the increased parallelism at a chip level, however, Power 7 systems can scale all the way up to 256 cores and 1,024 threads which can either be allocated to a single LPAR for very demanding scale-up workloads or can be used to achieve massive application consolidation. Memory capacity scales to 8TB on the largest systems and is connected to the Power 7 processors through a high bandwidth interface. Reliability of these systems is exceptionally high, due to combination of hardware and software engineering, resulting in almost 5 nines of availability. Comparing to the competition, this results in less than half the expected downtime than Solaris or HP-UX systems and less than one fifth the expected downtime of x86 systems. High availability and efficiency combine to allow these systems to be run reliably at utilization rates as high as 90%. Virtualization & operating systems notes: The Power hypervisor (aka PowerVM) is the foundation of system software support, drawing on extensive virtualization support in the hardware. Virtualized execution is intrinsic to the platform and incurs a minimal (<2%) runtime performance penalty on most code. PowerVM scales to the largest 256 core, 8TB RAM system. An LPAR can be allocated the entire 256 core system or as little as 1/10 of a core. Over 1,000 LPARs can be supported in the largest Power 7 systems. Memory is actively shared across guest operating systems, providing greater effective real memory by taking advantage of differential workload across the guests. Memory can also be selectively compressed on AIX guests to take advantage of natural locality of reference in many workloads. In such cases, the effective memory can be expanded by 2x or more, trading off CPU utilization but with little negative observable performance impact. AIX provides dynamic management of the simultaneous multi-threading on Power 7, boosting single thread performance by up to 15% at low to moderate system utilization by configuring cores in their fastest non-shared mode. AIX also manages memory affinity carefully to best exploit local memory references while minimizing references to slower, remote memory. Such affinity optimization results in 10% improvement in OLTP workloads. The system software layers also actively manage the power consumed by the system by adjusting the frequency across the entire range of 40% to 110% of nominal, according to varying workload. Compilers and Java Virtual Machine: Since the creation of the first Power processor in the early 1990s, the compiler technology has been co-developed and co-optimized. With every new chip, the compilers are extensively customized to get the very best performance possible on a range of workloads. Power 7 is no exception, with single thread performance boosts observed to be up to 50% in some cases. On OLTP and ERP workloads, the compilers deliver between 10 and 15% overall performance boosts. On HPC and analytics codes, the compiler effect can be much more dramatic, boosting performance by up to 50% in real, complex applications. Like the compilers, the Java virtual machine, which is 100% IBM developed, is heavily customized to each new chip and system. In the case of Power 7, the “just in time” compiler delivered between 10-15% on a broad set of workloads and more spectacular results on selected workloads such as the 9-10x boost in performance for RSA cryptography operations. In addition, the Java virtual machine itself has adapted to the Power 7 system, delivering large improvements which optimize for memory affinity and a remarkable 15-25% on Java code which runs in 64-bit mode. Middleware: IBM and IBM partners produce a broad range of middleware platforms. Two of the most important for IBM are the WebSphere Application Server and the DB2 database. Much of the rest of both IBM’s portfolio of middleware and our partners’ middleware is based at least in part on one or both of these and rely to a significant degree on them to deliver parallel performance and efficiency. For Power 7, WebSphere was aggressively optimized on many fronts, building on the big boosts in performance delivered by the Java virtual machine. Database connection pools have 4x more parallelism, caching code paths have 6x more parallelism and thread pools are 3x more effective at dispatching work. A distinct project known as “Mason” focused on the engineering of the full software stack to deliver massive scaling improvements, boosting peak single instance performance by 85%, utilizing 32 threads or an entire Power 7 chip. Beyond a single chip, WebSphere scales out by using multiple application server instances. In this scenario, Mason delivered up to 50% improvement on Power 7 systems. DB2 was also heavily optimized for Power 7, as it is for every chip and system generation. It scales up to the largest LPARs and now also scales out to take advantage of new clustering support available in AIX. DB2 also takes advantage of the protection keys built into the hardware and exposed by AIX to accelerate user code hosted in the database (UDFs) without sacrificing the integrity of the database in the event that user code fails. Benchmark details (all compare Power 570 P6 systems with Power 780 P7 systems): -----------------Power 6------------------- -----------------Power 7------------------ Result Cores Sockets Result Cores Sockets SPECint_rate2006 478 16 8 652 16 2 SPECfp_rate2006 426 16 8 586 16 2 SPECjbb2005 (jops) 867,989 16 8 1,331,641 16 2 SAP SD 2-tier (Users) 14,432 32 16 37,000 64 8

Transcript

  • 1. Tom Rosamilia General Manager, Power and z Systems IBM Corporation Optimized Systems. Matching technologies for business success.
  • 2. Designed for data Harness all available information including Big Data to unlock insights for better decision making. Managed in the Cloud Deliver new services with Cloud and reinvent business processes to drive innovation. Optimized systems play a critical role in Smarter Computing Tuned to the task Drive greater performance and efficiency using Optimized Systems for each workload for superior economics.
  • 3. On a smarter planet, different workloads have different characteristics Consider Smarter Traffic Smart Monitoring Monitoring workload to integrate multimodal information in real time Smart Analytics Analytic workload looking for patterns in traffic
    • Multiple data sources
    • Real-time collection of data
    • 24x7 operation
    • Real-time analysis of traffic problems
    • Queries of varying complexity
    • Forecasting traffic patterns
  • 4. Virtualization & Operating Systems Microprocessor Design Semiconductor Technology
    • Software
    • Stack integration
    • Middleware tuned for hardware
    • Integrated management across architectures
    • Hardware
    • Multi-core architectures
    • Advanced threading
    • Low latency
    Domain Knowledge Workload characteristics Interdependencies Architecture options Optimized Middleware Compilers & Java Virtual Machine Systems Design Optimized systems are tuned to help address the unique needs of any workload
  • 5.
    • Economics
    • Optimize total cost of ownership
    Performance Align performance requirements with business needs
    • Architecture
    • Create a strong
    • architectural foundation to address business needs
    Key considerations for leveraging optimized systems
  • 6. Appliances Integrated, optimized systems Client-built with optimized components IBM Netezza TwinFin IBM Smart Analytics System 7600 IBM DB2 & WebSphere on IBM POWER System IBM offers a spectrum of workload optimized systems Need flexibility to deploy multiple workloads of different types—e.g. data management, messaging, web facing etc. Requires moderate flexibility to tune small number of workloads—for example, data management and analytics Flexibility not required—need high performance at low cost for a specific workload
  • 7. System z Source: Based on IBM internal studies; *Pricing comparison based on US list prices of IBM DB2 Advanced Enterprise Edition and the Oracle software with analogous capabilities: Oracle Database Enterprise Edition, Advanced Compression, Active Data Guard, Label Security, Partitioning, Oracle Enterprise Manager, Internet Developer Suite, Diagnostics Pack, Oracle-to-Oracle Federation, Golden Gate. All list prices based on US and valid as of 01/26/2011. Power Systems System x IBM Storage Freedom by Design Performance Redefined Defining the Next Generation of x86 Servers Storage Reinvented Achieve up to 55% lower TCO per workload Power Systems running DB2 as low as 1/3 the cost of Oracle Database* Industry-leading Intel performance and lower management cost by 50% Reduce power, operating and cooling costs by up to 60% IBM’s optimized systems offer architecture choices, help drive down cost per workload and improve performance
  • 8. Anatomy of an optimized system: POWER7 45 nm SOI Semiconductor Technology eDRAM cache integrated on processor chip
  • 9. Anatomy of an optimized system: POWER7 45 nm SOI Semiconductor Technology Turbo-core mode for thread performance & max core for throughput performance POWER7 Microprocessor Design
  • 10. Anatomy of an optimized system: POWER7 45 nm SOI Semiconductor Technology Power Systems Design Intelligent failure detection and recovery Large memory with high bandwidth POWER7 Microprocessor Design
  • 11. Anatomy of an optimized system: POWER7 45 nm SOI Semiconductor Technology Power Systems Design Virtualization & Operating Systems Granular, scalable virtualization of servers and I/O with low overhead POWER7 Microprocessor Design
  • 12. Anatomy of an optimized system: POWER7 45 nm SOI Semiconductor Technology Power Systems Design Virtualization & Operating Systems Compilers & Java Virtual Machine Joint optimization of compiler and microprocessor for optimum performance POWER7 Microprocessor Design
  • 13. Anatomy of an optimized system: POWER7 45 nm SOI Semiconductor Technology POWER7 Microprocessor Design Power Systems Design Virtualization & Operating Systems Compilers & Java Virtual Machine IBM Optimized Middleware Middleware optimized to exploit hardware architecture (workload management, storage keys, page sizes etc.)
  • 14.
    • 54% lower cost per workload than SPARC 3
    • 2.5X more performance than Sun Fire 4
    • 5X more transactions at 1/4 the cost of SPARC 5
    Web and Collaborative Messaging
    • 12.8X improvement over Microsoft BizTalk Server 6
    • 3.9X more SAP users per core than Oracle 1
    • 4X more throughput at 1/4 the cost per report of competitor’s database machine 2
    Database 1 SAP SD Standard Application Benchmark Results, Two-Tier Internet Configuration. http://www.sap.com/solutions/benchmark/sd2tier.epx 2 IBM SWG Internal Study of Cognos analytics workload – concurrent execution of mixed (complex, intermediate, simple) reports 3 IBM SWG Internal Study consolidating very heavy Web Facing workload 4 IBM SWG Internal Study IBM WAS on Power vs Leading App Server on Oracle Sunfire for Web Facing Applications 5 IBM SWG Internal Study IBM WAS on Power Provides Greater Throughput at a lower cost than than leading competitor 6 IBM SWG Internal Study IBM WebSphere Message Broker on Power gives 12.8X Times More Throughput than Microsoft BizTalk Server on Dell Power Systems and IBM Software integration and optimization
  • 15. Heavy UNIX online banking workloads 1,075 trans/sec Delivering services with superior economics
  • 16. *Statement of Direction Achieves overall lowest cost per workload zEnterprise offers the industry’s broadest architecture
  • 17. zEnterprise reduces network cost and complexity
  • 18. Entry points to optimize workloads
    • Consolidate Workloads
    • How would I reduce the Total Cost of Ownership?
    • How would I better manage my data center complexity?
    • Re-deploy Existing Workloads
    • How would I improve the performance?
    • How can I improve my service levels?
    • Deploy New Workloads
    • How can I rapidly deploy a workload leveraging existing skills?
    • How would I scale my infrastructure with my business?
  • 19. Consolidate workloads to reduce costs
    • Example
    • Consolidate x86 server landscape
    • to Linux on System z
    • Benefits
    • Consolidate hundreds of servers to one
    • Decrease costs 30-50 percent
    Performance, reliability, disaster recovery, server provisioning and cost efficiency have all seen dramatic improvements—helping BCBSM deliver better service and better value to its members across the state. — BCBSM
  • 20. Redeployed workload from Intel Xeon processor-based servers to IBM BladeCenter PS701 blades running Linux to reduce complex calculation times from one month to one week. — UMASS DARTMOUTH
    • Example
    • Redeploy x86 workloads to POWER7
    • with Linux via IBM Migration Factory
    • Benefits
    • Improve performance 10x or more
    • Improve service quality
    Re-deploy workloads to improve performance
  • 21. Deployed Netezza achieving ~100x improvements for query response times, while also improving ability to comply with regulatory obligations. — NYSE
    • Example
    • Deploy analytics workloads on Netezza
    • leveraging IBM Information on Demand
    • Infrastructure Services
    • Benefits
    • Realize up to 100x performance improvements at a third of the cost
    • Significantly reduce in deployment times
    • Scale with demand growth, without requiring application changes
    Deploy new workloads on optimized systems
  • 22. Deploying new workloads on POWER7 and re-deploying existing Cognos workloads from x86 to z196 for scaling
    • Projects and capabilities
    • Deploy workloads on zEnterprise for security and simplified management
    • Redeploy workloads for scale and increased productivity
    Primerica is on the smarter computing journey
  • 23. Information Technology Overview Primerica David Wade Chief Information Officer Executive Vice President Primerica Financial Services  
  • 24. Primerica
    • Headquartered in Duluth, GA
    • Leading distributor of financial products to middle income households in North America
    • Underwrites term life insurance, mutual funds, variable annuities, loans and other financial products  
    • ~ 95,000 licensed sales representatives who assist clients
    • Insures 4.3 million lives and more than two million clients maintain investment accounts with the company
  • 25. Leading IT organizations are partnering with IBM to benefit from Optimized Systems. Why?
    • Experts and expertise to help any client craft a strategy to consolidate and optimize their workloads.
    • IBM offers systems that enable clients to optimize any workload
    • Ability to transform current environment— no “rip and replace” required.
  • 26.  
  • 27. Semiconductor Technology Microprocessor Design Systems Design Virtualization & Operating Systems Compilers & Java Virtual Machine Optimized Middleware Anatomy of an Optimized System: System z
    • World’s fastest chip at 5.2 GHz
    • More cache for superior data serving
    • On-chip crypto and data compression
    • Instruction co-optimized with compilers
    • Optimized I/O pathways & subsystems with massive scale
    • Best-of-breed reliability, availability & serviceability
    • RAIM memory for high availability
    • Integrated cryptographic coprocessor
    • Highest security certification in industry (EAL5 & FIPS)
    • Multi-platform design of mainframe and distributed technologies
    • Designed for highest utilization with heterogeneous workloads
    • SLA management of heterogeneous workloads based on business policies
    • Granular, scalable Virtualization of servers, memory and I/O with low overhead
    • PR/SM and z/VM offer two-tier approach for superior virtualization supporting native Linux
    • Dispatching on z/OS keeps software close to cache for optimized performance
    • Intelligent management of mainframe & distributed technologies
    • Java Compiler optimized to exploit hardware architecture
    • WebSphere, MQSeries, DB2 exploit multi-system workload management ,scale and availability
    • Middleware optimized and tuned to scale up
    • Tivoli optimized for operations management and security
    • eDRAM cache integrated on processor chip
    • Efficient packaging
    • 2x transistor density*
    • 28% increased transistor speed*
    • 4x array density*
  • 28. Anatomy of an Optimized System: POWER7 45 nm SOI Semiconductor Technology POWER7 Microprocessor Design Power Systems Design Virtualization & Operating Systems Compilers & Java Virtual Machine IBM Optimized Middleware * relative to Power6
    • eDRAM cache integrated on processor chip
    • Efficient packaging
    • 2x transistor density*
    • 28% increased transistor speed*
    • 4x array density*
    • Multi-core chips (8 cores)
    • Simultaneous multi-threading
    • Turbo-core mode for thread performance & max core for throughput performance
    • Enterprise reliability, availability & serviceability
    • Intelligent failure detection and recovery
    • Large memory with high bandwidth
    • Designed for 90%+ utilization
    • Secure isolation of virtual machines
    • Balanced I/O & compute
    • Granular, scalable virtualization of servers and I/O with low overhead
    • Energy scale technology
    • Dynamic thread and memory optimization
    • Integrated, intelligent management
    • Joint optimization of compiler and microprocessor for optimum performance
    • Intelligent operating system task dispatching to minimize impact of non-uniform memory access
    • Middleware optimized to exploit hardware architecture (workload management, storage keys, page sizes etc.)
    • Scale out with no loss of performance
    • Clustered memory and lock management
    • Optimized threading and memory management