SlideShare a Scribd company logo
1 of 20
Download to read offline
Analyzing Business as It Happens
SAP In-Memory Appliance Software (SAP HANA™)
runs on the Intel® Xeon® processor to generate
superior, real-time business intelligence.


WHITE PAPER
Intel® Xeon® Processor E7 Family
SAP In-Memory Appliance Software (SAP HANA™)
April 2011 | Version 1.0



1 Introduction
Increasingly sophisticated business decision models              SAP In-Memory Appliance Software (SAP HANA™) delivers
depend on extremely fast access to and manipulation of           next-generation in-memory computing through an ongoing
massive data stores. Insight into business operations often      engineering collaboration between SAP and Intel to provide
demands data volumes that are beyond the capabilities of         optimized performance and reliability on Intel® architecture,
traditional disk-based systems to process them in real time,     including the Intel® Xeon® processor E7 family.
which can limit access to benefits such as the following:        This paper introduces SAP In-Memory Computing
• Efficiency provided by the ability to respond in               technology concepts, explains how SAP HANA uses them
  real time to the changing needs of the business                to help deliver real-time business insight, and describes
                                                                 how synergies with Intel architecture-based server
• Flexibility based on insight that accurately directs
                                                                 platforms drive solution value. The discussion begins with
  quick action
                                                                 an examination of the hardware and software innovations
• Empowerment of business users to make and
                                                                 that set the stage for SAP In-Memory Computing, including
  act on smart decisions
                                                                 the ways in which the architectures of Intel Xeon processors
This reality often causes a separation between ongoing           and SAP HANA complement one another.
business needs and the analytic applications that support
                                                                 Next, the paper describes specific optimizations made as
them. In such cases, the lag time between data gathering
                                                                 part of the ongoing, collaborative engineering relationship
and interpretation delays the ability to benefit from business
                                                                 between Intel and SAP. In the context of general database
information, thereby limiting its value.
                                                                 concepts, the paper illustrates how SAP HANA takes
                                                                 advantage of Intel architecture to deliver high performance
                                                                 and reliability, as well as excellent support for application
                                                                 developers working with SAP HANA. Next, the paper
                                                                 describes real-world customer benefits derived from SAP
SAP In-Memory Appliance Software (SAP
                                                                 HANA implementations. Finally, the paper explores how
HANA™) uses SAP In-Memory Computing                              future technology may drive even further the value of joint
technology and the performance and                               solutions based on Intel Xeon processors and SAP HANA.
reliability of the Intel Xeon processor E7
                           ®       ®


family to dramatically improve the speed of
data analysis, enabling business intelligence
to be generated in real time. That capability
increases the value of business data with more
sophisticated decision support than has been
possible with solutions based on traditional
disk-based systems.
1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1   2 Intel and SAP Technology Advances Drive
    2 Intel and SAP Technology Advances                                         Database Evolution
      Drive Database Evolution . . . . . . . . . . . . . . . . . . . . 2
                                                                              The evolution of Intel architecture toward greater performance
     2.1 Intel Hardware Technology Innovations . . . . . . . . . . . 2
                                                                              at lower cost sets the stage for the advances of SAP HANA.
       2.1.1 The Rise of Mainstream Server Parallelism . . . . . . . . 2
                                                                              The SAP In-Memory Computing Database that lies at the heart
       2.1.2 The Drive to 64-Bit Processing . . . . . . . . . . . . . . 3
                                                                              of this technology is engineered to take advantage of each
       2.1.3 Other Architecture Features that Benefit SAP HANA . . . 4
                                                                              aspect of a high-performance, balanced compute platform:
     2.2 SAP Software Technology Innovations . . . . . . . . . . . 5
       2.2.1 Establishing a Solid Database                                    • Memory. Increasingly large amounts of addressable
             for In-Memory Computing . . . . . . . . . . . . . . . . . 5        memory, running at ever-higher speeds, combine with
       2.2.2 Delivering the SAP In-Memory Computing
                                                                                growing caches and other improvements to create
             Database into the Mainstream . . . . . . . . . . . . . . . 5
       2.2.3 SAP HANA: The Next Generation
                                                                                memory subsystems robust enough to easily support
             of Real-Time Business Analysis . . . . . . . . . . . . . . 5       the demands of SAP In-Memory Computing.
    3 Creating a Transition in Database Design . . . . . . . . . . . 6
                                                                              • Processing. Constant architectural improvements to
     3.1 Performance Implications of Column
         versus Row Storage . . . . . . . . . . . . . . . . . . . . . 7         complete more work per clock cycle extend the value
     3.2 Dictionary Compression with Column Storage . . . . . . . 8             of growing parallelism from multi-core designs, Intel®
     3.3 Efficiencies from Separation of Main and Delta Storage. . 8            Hyper-Threading Technology (Intel® HT Technology),
     3.4 Storage of and Queries against Historical Data . . . . . . 9           and increasingly scalable symmetric multi-processing.
     3.5 Boosting Performance by Moving                                       • I/O. Very fast on-board data transfer using Intel®
         Results Instead of Source Data . . . . . . . . . . . . . . . 9
                                                                                QuickPath Interconnect is complemented by advances
    4 Collaboration by SAP and Intel for Performance
      Optimization and Parallelization . . . . . . . . . . . . . . . . 9        in inter-node communication, to enable the low latency
     4.1 Benefits from Multi-Socket Topology and NUMA . . . . . . 9             in data transfer within and between nodes in the SAP
     4.2 Optimization for Multi-Core Processing. . . . . . . . . . 10           HANA appliance.
       4.2.1 Parallel Aggregation . . . . . . . . . . . . . . . . . . . 10
       4.2.2 Processor Cache Optimizations . . . . . . . . . . . . . 11       2.1 Intel Hardware Technology Innovations
     4.3 Optimization for Core-Level Features . . . . . . . . . . . 12        Intel has continually led the industry in driving advances
       4.3.1 Optimization for Intel HT Technology . . . . . . . . . . 12
                                   ®
                                                                              in commercial off-the-shelf server platforms that increase
       4.3.2 Standards Compliance and Accuracy                                performance, scalability, and value. These innovations
             for Decimal Floating-Point Math . . . . . . . . . . . . . 12
                                                                              directly enable SAP HANA to transcend barriers to what
       4.3.3 Benefits from Intel® Turbo Boost Technology . . . . . . 13
                                                                              businesses are able to do with their data, generating better
       4.3.4 Optimization of Bit Compression with
             Intel® SSE-Based Vectorization . . . . . . . . . . . . . 13      business decisions in response to real-time events.
     4.4 Performance Benefits from Optimization                               This section traces the history of Intel processor innovation,
         and Hardware Platform Features . . . . . . . . . . . . . 13
                                                                              setting the stage for an understanding of how SAP HANA
      4.4.1 Performance Comparison between Processor
            Generations and Impact of Hardware Features . . . . . 13          appliances based on Intel architecture produce synergies
      4.4.2 Performance Benefits of Intel® SSE                                between the hardware and software that benefit business
            Vectorization for Search . . . . . . . . . . . . . . . . . 14     customers.
      4.4.3 Performance Scaling with Increasing Core Count . . . . 15
    5 Enhancing Reliability for In-Memory
      Computing Systems . . . . . . . . . . . . . . . . . . . . . . 16
     5.1 In-Memory Recovery . . . . . . . . . . . . . . . . . . . . 16           “Intel and SAP engineering teams have
     5.2 ACID Properties . . . . . . . . . . . . . . . . . . . . . . 16           collaborated extensively on SAP HANA™
    6 Application Development and Tools                                           and SAP In-Memory Computing technology.
      for In-Memory Computing. . . . . . . . . . . . . . . . . . . 16
     6.1 Intel® Performance Counter Monitor for                                   Comparing the in-memory system performance
         Debugging and Optimization . . . . . . . . . . . . . . . . 17            with a classical relational database, the query
     6.2 Intel® Software Development Products . . . . . . . . . . 17              time was reduced from 77 minutes to 13 seconds
    7 Real Customer Example:
      Building Richer Sales Support at Hilti Corporation . . . . 17
                                                                                  when run on the Intel® Xeon® processor 7500
    8 Outlook and Future Research Topics . . . . . . . . . . . . 18               series. We are thrilled with this result, and it
     8.1 Evolution of Non-Volatile Memory . . . . . . . . . . . . . 18            showcases once again the powerful impact of
     8.2 Hardware Acceleration . . . . . . . . . . . . . . . . . . . 18           joint problem solving between our two companies.”
     8.3 New Instructions . . . . . . . . . . . . . . . . . . . . . . 18
     8.4 Many Cores . . . . . . . . . . . . . . . . . . . . . . . . . 18                            - K. Skaugen, Intel Vice President and
    9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 18                       General Manager of the Data Center Group
    10 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . 19


2
2.1.1 The Rise of Mainstream Server Parallelism              One key solution has been to produce multi-core
At one time, boosts in processor performance were created    processors, so that each processor socket contains
largely by ongoing increases in clock speed. The simple      multiple, individual execution engines—a design concept
fact of providing more cycles per second retired more        taken from mainframe and other large-scale systems,
instructions per unit of time, allowing software to become   where it had been employed for some time. Even before
more capable and deliver faster response times. As clock     introducing its multi-core designs, Intel developed Intel HT
speeds grew, however, processor designs faced limitations    Technology, which allows a single processing core to handle
due to growing energy consumption and heat generation.       two strings of instructions (software threads) at a time,
                                                             making better use of on-die resources that would otherwise
                                                             be idle.
                                                             Together with symmetric multi-processing (multiple
                                                             processors and multiple servers working together), multi-
Intel and SAP: A History of Co-Innovation
                                                             core processing and Intel HT Technology have made parallel
For more than 10 years, Intel and SAP have worked            programming the de facto industry standard. Some of the
together to deliver industry-leading performance             techniques used by SAP HANA to take advantage of parallel
of SAP solutions on Intel® architecture, and a large         hardware resources are discussed later in this paper.
proportion of new SAP implementations are now
                                                             2.1.2 The Drive to 64-Bit Processing
deployed on Intel® platforms. The latest success
from that tradition of co-innovation is available to         Another limitation that enterprise computing systems have
                                                             faced is that the 32-bit systems in general enterprise use
customers of all sizes in SAP In-Memory Appliance
                                                             until the mid-2000s were limited by their ability to address
Software (SAP HANA™), which is delivered on the
                                                             no more than approximately four gigabytes (GB) of system
Intel® Xeon® processor.
                                                             memory per server. Beginning in 2004 with the Intel Xeon
The relationship between Intel and SAP has                   processor formerly code-named Nocona, Intel began
become even stronger over the years, growing to              shipping hardware capable of running 64-bit OSs, which
include a broad set of collaborations and initiatives.       can address far larger amounts of memory. The theoretical
Some of the most visible include                             limits of memory capacity are as follows:
the following:                                                32-bit systems: 232 = 4,294,967,296 bytes, or ~4 GB
• Joint roadmap enablement. Early in the design               64-bit systems: 264 = 18,446,744,073,709,551,616 bytes,
  process, Intel and SAP decision makers identify             or ~18 billion GB (18 Exabytes)
  complementary features and capabilities in their           In reality, the upper limits of memory addressability for
  upcoming products, and those insights help to              64-bit systems are governed by the physical capabilities
  direct the development cycle for maximum value.            of the platform and to some extent by the cost of the
                                                             memory itself. As platforms become more capable, system
• Collaborative product optimization. Intel
                                                             memory capacity continues to grow, and at the same time,
  engineers located on-site at SAP work with their
                                                             the cost of memory on a per-gigabyte basis continues to
  SAP counterparts to provide tuning expertise that
                                                             drop. Server memory capacities are now often measured in
  enables SAP HANA and other software solutions              terabytes (TB), where 1 TB = 1,000 GB.
  to take advantage of the latest hardware features.
                                                             In addition to being able to address more memory than their
• Combined research efforts. Together,                       32-bit ancestors, platforms based on 64-bit architectures
  researchers from Intel and SAP continually                 make more and wider registers available to the compiler
  explore and drive the future of business                   (theoretically enabling the compiler to generate faster code).
  computing. Belfast Collaboratory, an excellent             They also have the benefit of some larger data types, a
  example, opened in October 2009 to innovate                crucial factor for SAP HANA, where gigabytes of memory
  around cloud computing and the next generation             are handled within a single process. In fact, the port to
  of sustainable IT.                                         64-bit architecture was one of the first joint efforts by Intel
                                                             and SAP related to SAP NetWeaver ® Business Warehouse
As a result of these efforts, customer solutions             Accelerator (BWA, formerly known as SAP NetWeaver
achieve performance, scalability, reliability, and           Business Intelligence Accelerator), a precursor in many
energy efficiency that translate into favorable              ways to SAP HANA.
ROI and TCO, for increased business value.


                                                                                                                               3
2.1.3 Other Architecture Features that Benefit SAP HANA
    Successive generations of Intel® server platforms introduce                         Intel Xeon processor E7 family as an example, SAP HANA
    an enormous range of other architectural features that                              benefits from Double Device Data Correction (DDDC), which
    drive new platform capabilities. For example, the Intel Xeon                        allows recovery from two simultaneous memory errors,
    processor E7 family adds the following new features and                             supporting greater availability of SAP HANA systems.
    capabilities relative to its predecessor that enhance the
                                                                                        Platform innovation also includes the introduction of new
    scalable performance of SAP HANA implementations:
                                                                                        instructions such as the series of Intel® Streaming SIMD
    • Increased core count, from 8 cores (16 threads) to 10                             Extensions instruction sets (Intel® SSE, Intel® SSE2, Intel®
      cores (20 threads) per socket, provides more computation                          SSE3, Intel® SSSE3, Intel® SSE4.1, and Intel® SSE4.2). These
      resources for SAP HANA’s data manipulation demands.                               instructions are designed to accelerate specific types of
    • Larger memory capacity, from 16 GB DIMMs (1 TB for                                operations, and their impact on SAP HANA is discussed in
      a four-way server) to 32 GB DIMMs (2 TB for a four-way                            further detail later in this paper.
      server), increases the ability of SAP HANA to hold large
                                                                                        Finally, Intel continues to innovate around process-
      amounts of data for in-memory computation.
                                                                                        technology shrink in its manufacturing operations. Moving
    • Larger cache, from 24 megabytes (MB) to 30 MB,                                    to a smaller process technology allows the creation of
      increases the amount of data SAP HANA can hold for
                                                                                        smaller transistors and other on-chip features, which allows
      fast access without reading from DRAM.
                                                                                        more of those features to be placed in a given amount
    Likewise, evolving server platforms provide new features for                        of on-die area, for faster switching, lower per-transistor
    security, and reliability, availability, and serviceability (RAS)                   energy consumption, and lower per-transistor waste-heat
    that support business-critical SAP HANA deployments.                                production. Figure 1 shows some key advances in the
    Again using features and capabilities introduced with the                           development of the Intel Xeon processor E7 family.




                              2010:                                                   2011:
             Intel® Xeon® Processor 7500 Series                         Intel® Xeon® Processor E7 Family

        C       C     C      C    C       C      C       C   C      C      C   C    C    C      C    C      C   C    Figure 1. The Intel® Xeon® processor E7
        O       O     O      O    O       O      O       O   O      O      O   O    O    O      O    O      O   O    family adds features and capabilities that
        R       R     R      R    R       R      R       R   R      R      R   R    R    R      R    R      R   R    directly benefit SAP HANA deployments.
        E       E     E      E    E       E      E       E   E      E      E   E    E    E      E    E      E    E
        1       2     3      4    5       6      7       8   1      2      3   4    5    6      7    8      9   10

                24 MB Shared Last Level Cache                           30 MB Shared Last Level Cache

            Integrated Memory            QPI Link                Integrated Memory              QPI Link
                 Controller             Controller                    Controller               Controller



             Support for 16 GB           Four Intel®              Support for 32 GB             Four Intel®
                DDR3 DIMMs               QuickPath                   DDR3 DIMMs                 QuickPath
              (1 TB RAM for a         Interconnects at             (2 TB RAM for a           Interconnects at
            four-socket system)        up to 6.4 GT/s            four-socket system)          up to 6.4 GT/s


                          45nm process                                         32nm process
                           technology                                           technology

            • Intel® Turbo Boost Technology                      • Enhanced Intel® Turbo Boost Technology
            • Intel® Intelligent Power Technology                • Intel® Intelligent Power Technology
            • Intel® Hyper-Threading Technology                  • Intel® Hyper-Threading Technology
            • Machine Check Architecture Recovery                • Machine Check Architecture Recovery
            • Intel® Scalable Memory Interconnect Failover       • Intel® Scalable Memory Interconnect Failover
            • Intel® QuickPath Interconnect Self-Healing         • Intel® QuickPath Interconnect Self-Healing
                                                                 • Intel® Trusted Execution Technology
                                                                 • Intel® AES New Instructions
                                                                 • Double Device Data Correction (DDDC)
                                                                 • Partial Memory Mirroring




4
2.2 SAP Software Technology Innovations                        • A calculation and planning engine with a data
Around 2000, SAP recognized the trends toward                    repository, providing persistent views of business
rapidly-growing data sets owned by many companies                information and a unified information modeling
and increasing business needs for real-time analytics.           design environment
In response to those realities, SAP engineers began to         • A data management service that includes SQL
re-imagine how data warehouses could handle these                and MDX interfaces, data integration capabilities for
massive data stores to generate immediate, ad hoc analysis.      accessing SAP and non-SAP data sources, and
Disk-access times had become a primary bottleneck in             integrated life cycle management.
addressing those needs using conventional database
queries. Research showed promise for removing that factor
by conducting queries solely in memory, without having to                     SAP In-Memory Computing Database

access the physical disk.                                             SAP In-Memory
                                                                    Computing Technology
Two key factors helped make in-memory computing viable                                                Calculation and
for mainstream implementations. As discussed above, by                Row          Column             Planning Engine
                                                                      Store         Store
the middle of the first decade of the 21st century, system
memory had begun to fall sharply in cost, and 64-bit servers
had opened the door to commercial off-the-shelf hardware                           Data Management Service
that could address very large amounts of system memory.
Those advances set the stage for SAP’s implementation of
                                                               Figure 2. SAP In-Memory Computing supports complex
in-memory computing for general business use.
                                                               calculations on massive amounts of data.


                                                               When combined, these capabilities allow the SAP
“For over 10 years, SAP and Intel have                         In-Memory Computing Database to help decision makers
 collaborated to create visionary solutions that               explore and analyze vast amounts of information at
 enable customers to gain competitive advantage                incredible response times and with a high degree of
                                                               flexibility, without the need for IT involvement.
 and run better. The SAP In-Memory Appliance
 (SAP HANA™), delivered on the Intel® Xeon®                    2.2.2 Delivering the SAP In-Memory Computing
 processor, provides customers with real-                            Database into the Mainstream
 time results from analyses and transactions                   SAP first made SAP In-Memory Computing available in
 that enable better decisions and improved                     product form with the introduction of TREX (Text Retrieval
 operations. It’s just the latest example of how               and information EXtraction), a search engine component
 game-changing innovation between industry                     used as an integral part of various SAP software offerings.
                                                               Building on the early success of products such as SAP
 leaders can alter how enterprises do business
                                                               NetWeaver Enterprise Search, the company introduced
 and succeed.”
                                                               SAP BWA in 2005.
                    - Stefan Sigg, Senior Vice President,      The company’s ongoing focus on business intelligence (BI)
                               In-Memory Platform, SAP         solutions led it to integrate SAP In-Memory Computing into
                                                               SAP BusinessObjects to create an accelerated BI solution
                                                               called the SAP BusinessObjects Explorer, Accelerated
2.2.1 Establishing a Solid Database                            Version. SAP In-Memory Computing has also been used
      for In-Memory Computing
                                                               to enable real-time analysis in a variety of other products,
In-memory computing allows the processing of massive           including the following:
quantities of real-time data in the main memory of the
                                                               • SAP Advanced Planner and Optimizer with liveCache
server, providing immediate results from analyses and
transactions. The SAP In-Memory Computing Database             • SAP CRM Customer Segmentation
(Figure 2) delivers the following capabilities:                • SAP Business ByDesign™ Analytics
• In-memory computing functionality with native support
  for row and columnar data stores (discussed below),
  providing full ACID (atomicity, consistency, isolation,
  durability) transactional capabilities




                                                                                                                              5
2.2.3 SAP HANA: The Next Generation of Real-Time                       SAP HANA benefits dramatically from the design features
          Business Analysis                                                and benefits of the latest Intel Xeon processors, helping it
    SAP In-Memory Computing has evolved to create SAP                      deliver excellent value to end-customers. The performance,
    HANA, a multi-purpose appliance that combines SAP                      scalability, reliability, and energy efficiency of the hardware
    software components installed on servers based on the                  platform are particularly beneficial in this regard. High memory
    Intel Xeon processor. The integrated software stack is highly          capacity and innovations in the memory and cache subsystems,
    optimized for performance and reliability on the hardware,             as described above, support excellent results from the demands
    and it includes the SAP In-Memory Computing Database,                  of in-memory computing. The sophisticated parallel
    real-time replication service, data modeling, and data                 software design, as described below, takes advantage
    services, as illustrated in Figure 3.                                  of the platform’s large-scale hardware parallelism.
                                                                           SAP HANA is engineered specifically to be data-agnostic,
                             Other Applications     SAP Business Objects
                                                                           and the appliance is designed to be implemented without
                                                                           affecting existing applications or systems. The solution
                                 MDK          SQL              BICS        provides a flexible, cost-effective, real-time approach for
                                                                           managing large data volumes, allowing organizations to
       SAP NetWeaver®          SAP In-Memory Computing Database
     Business Warehouse                                                    dramatically reduce the hardware and maintenance costs
                                  In-Memory         Calculation and        associated with running multiple data warehouses and
         Third-Party              Computing         Planning Engine        analytical systems.
        Data Sources
                                                                           In the future, SAP HANA will act as the technology foundation
      SAP Business Suite             Data Management Service
                                                                           for new, innovative business applications for planning,
                                                                           forecasting, operational performance, and simulation. The
                                   Admin and Data Modeling                 first business application running on SAP HANA is the SAP
                                                                           BusinessObjects Strategic Workforce Planning, version
    Figure 3. SAP HANA          Real-Time Replication Services
    provides data-agnostic
                                                                           for in-memory computing, a solution SAP developed that
    integration with other                Data Services                    dramatically benefits from high-performance and real-time
    business systems.                                                      access to large volumes of transactional information.

                                                                           3 Creating a Transition in Database Design
                                                                           With the growth of data stores and business application
                                                                           complexity, solution providers have steadily developed
    The Intel and SAP HANA™ Value Proposition                              novel approaches to optimizing performance. One key
    Across a Wide Variety of Business Units                                development has been the separation of online transactional
    SAP In-Memory Appliance Software (SAP HANA™)                           processing (OLTP) and online analytical processing (OLAP)
                                                                           systems because both these types of relational databases
    and the Intel® Xeon® processor enable a broad set
                                                                           have somewhat divergent design goals. In the context of BI
    of scenarios within various parts of the business,
                                                                           solutions, they can be considered as follows:
    above and beyond transactional excellence and
                                                                           • OLTP systems instantly record business events as they
    traditional analytics, such as the following:
                                                                             happen, such as the sale of a piece of inventory. As such,
    • Order management. Available to promise,                                system designs focus on quickly handling large numbers
      price optimization                                                     of small, simultaneous transactions.
    • Supply chain. Transportation planning, inventory                     • OLAP systems provide analysis of the data provided by
      management, demand and                                                 OLTP systems to support business decisions. They are
      supply planning                                                        designed to handle a relatively small number of often-
    • Manufacturing. Production planning, performance                        complex transactions.
      management, asset utilization                                        In his paper, “A Common Database Approach for OLTP
    • Sales and marketing. Analytics, customer                             and OLAP Using an In-Memory Column Database,”1 SAP
      segmentation, trade promotion management                             co-founder Hasso Plattner asserts the value of unifying
                                                                           both OLTP and OLAP in a single system based on column
    • Corporate finance. Planning and                                      storage and in-memory computation. While the concept of
      budgeting, consolidation                                             combining the two systems is beyond the scope of this paper,
    • Human resources. Workforce analytics                                 Plattner’s argument for the value of column storage and
    • Information technology. Landscape optimization                       in-memory computing to both OLTP and OLAP has direct
                                                                           bearing on the capabilities of SAP HANA in those areas.


6
3.1 Performance Implications of Column versus                        Columnar storage may be more efficient in the common
    Row Storage                                                      case where a given column includes only a relatively
Whereas relational databases typically use row-based                 small number of distinct values. For example, if the table
storage, column-based storage may be more suitable                   shown in Figure 4 includes a very large number of rows,
in some cases. As shown in Figure 4, a database table                one would expect that each entry in the Country column
is conceptually a two-dimensional structure made up of               would nevertheless consist of one of perhaps a few dozen
cells arranged in rows and columns. Because computer                 possible entries. That factor allows for very efficient data
memory is structured linearly, there are two options for             compression, which produces a number of advantages.
the sequences of cell values that are stored in contiguous
                                                                     For example, data compression allows for faster loading
memory locations:
                                                                     from system memory into the processor cache. Since
• In row storage, the data sequence consists                         that transport tends to be the main factor limiting system
  of the data fields in one table row.                               performance, speeding it up may be expected to produce
• In column storage, the data sequence consists                      a performance advantage, even considering the added
  of the entries in one table column.                                overhead of decompressing the data. Another example
                                                                     of the advantages of columnar storage is related to
Figure 4 illustrates both options.                                   compression: Computing the sum of values in a column
                                                                     is much faster if many additions of the same value can
  Table                                                              be replaced by a single multiplication. Similarly, holding
                                                                     values from a column together in cache is crucial for bulk
       Country                Product                 Sales
                                                                     processing using Intel SSE instructions.
          US                    Alpha                 3.000
                                                                     Columnar storage also helps reduce the number of
          US                    Beta                  1.250          performance-limiting cache misses. Recognizing that
          JP                    Alpha                  700           data is transferred from main memory to the processor in
          UK                    Alpha                  450           blocks of fixed size called cache lines, consider the case of
                                                                     aggregating the sum of all of the values in the Sales column
  Row Store                             Column Store                 of a large version of the table in Figure 4. With row storage,
                                                                     only a minority of the data in any given cache line would
                    US                                   US
                                                                     be from the Sales column and therefore be relevant to the
    Row 1          Alpha                                 US
                                         Country                     calculation. The large proportion of irrelevant data would
                   3,000                                  JP
                                                                     result in cache misses, where the processor would have
                    US                                   UK
                                                                     to discard the irrelevant data and wait for useful data in its
    Row 2          Beta                                 Alpha
                                                                     place. Conversely, with columnar storage, every piece of
                   1.250                                 Beta
                                         Product                     data in each cache line would be relevant to the calculation,
                    JP                                  Alpha        increasing efficiency by keeping the processor supplied
    Row 3          Alpha                                Alpha        with useful data.
                    700                                 3.000
                    UK                                  1.250
                                          Sales
    Row 4          Alpha                                 700
                    450                                  450

Figure 4. Unlike traditional databases, SAP HANA offers the choice
between row-based and column-based data storage for each
table.




                                                                                                                                      7
3.2 Dictionary Compression with Column Storage                          3.3 Efficiencies from Separation of Main
    Dictionary coding is a significant benefit when compressing                 and Delta Storage
    column-stored data, as represented in Figure 5. Each                    As changes are committed into the database, they are
    distinct column value is assigned a sequential integer                  stored in delta storage, which is distinct from main storage,
    (beginning with zero) as a Value ID. An alphanumerically                and if new data contains column values that are not
    sorted dictionary stores all of the distinct column values,             included in the main dictionary, they are stored in a delta
    and the Value IDs are implicitly given by the sequential                dictionary. Because the delta dictionary is created without
    positions of each distinct column value. The actual column              recoding the Value IDs in the main dictionary, there is no
    data is stored as a sequence of Value IDs, which are bit                global order that encompasses the Value IDs in both the
    coded, meaning that they are stored using the minimum                   delta dictionary and the main dictionary.
    possible number of bits.




             Column ”Name“                                               Column ”Name“ (dictionary compressed)
             (uncompressed)


                                              Value-ID sequence
                                                                                                                       Dictionary
                                              One element for each row in column

               Miller                                       4                                                           0    Baker
               Jones                                        1                                                           1    Jones
               Millman                                      5                 Point into dictionary                     2    John




                                                                                                                                                     Sorted
               Zsuwalski                                    N                                                           3    Johnson
                                               Value IDs




               Baker                                        0                                                           4    Miller
               Miller                                       4                                                           5    Millman
               John                                         2                                                                            .
               Miller                                       4                                                                       ..
               Johnson                                      3                                                           N    Zsuwalski
               Jones                                        1

                      .
                 ..
                                                           ...




                                                                                                  Value ID implicitly
                                                                                                  given by sequence in                       Value
                                                                                                  which values are stored



    Figure 5. Dictionary coding can enable significant compression advantages in column-stored data sets.



    Dictionary compression of columnar data is most efficient               The delta store supports “greater than” and “less than”
    when the number of distinct column values is relatively                 queries by adding a CSB+-Tree to the delta dictionary. This
    small, compared to the size of the column (that is, when                tree holds the values of the delta store ordered as in the
    a relatively low number of Value IDs exists relative to the             main store, only the amount of memory needed is higher
    number of column values).                                               than in the main store case. To establish a single, sorted
                                                                            dictionary that includes all Value IDs when a delta dictionary
    Because the dictionary is sorted, search operations and
                                                                            has been established, the delta dictionary and main
    comparisons such as “greater than” and “less than” can
                                                                            dictionary must be merged using a delta merge operation.
    be performed against the value IDs to return results that
                                                                            To enable even higher compression of main storage, SAP
    are meaningful with regard to the actual column values.
                                                                            HANA selects and implements the optimal compression
    Because those operations are conducted on integer values,
                                                                            method for the dictionary itself during delta merge.
    they can be much faster than, for example, performing them
    against long strings.




8
3.4 Storage of and Queries against Historical Data                  4 Collaboration by SAP and Intel for Performance
As an optional feature, SAP HANA can use temporal tables              Optimization and Parallelization
to query against an historical state of the data. Write             Co-innovation and collaboration between SAP and Intel is
operations on temporal tables do not physically overwrite           an important prerequisite for the success of the SAP HANA
existing records. Instead, write operations always insert new       software components on Intel architecture-based hardware.
versions of the data record into the database. The different        The optimization draws both from industry best practices
versions of a data record have time stamps indicating their         and from solution-specific engineering, drawing broadly
validity, with current data being simply the most recent version.   on the expertise of performance engineers from both
                                                                    companies. As mentioned previously in this paper, these
Historical (non-current) data stored in a temporal table
                                                                    individuals have a long history of working together, and their
is still considered to be operational data, meaning that
                                                                    work on SAP HANA represents the culmination of many
it is expected to be used in current operations, such as
                                                                    years’ experience.
transactions or reporting. By contrast, future versions of
SAP HANA plan to support a category specifically for data           This four-part section identifies some of the specific
that is not expected to be used anymore, known as aged              optimizations the team has made to the SAP HANA platform
data, which no longer need to be stored in memory. Instead,         and the resulting performance benefits on Intel Xeon
aged data will be eligible to be written to disk and released       processor-based systems. The first three subsections
from memory.                                                        describe the optimizations, organized for convenience at
                                                                    decreasing scale within the system:
Note that while aged data would not be accessed during
normal operations, it would still be available if needed. To        • Among sockets. The Benefits from Multi-Socket
reduce access time, systems could use solid-state drives              Topology and NUMA subsection describes the
for this purpose. Alternatively, aged data could be written to        collaboration around features and capabilities at the inter-
flash memory, which can be accessed even more quickly.                processor level.
For each individual type of data, application developers            • Among cores. The Optimization for Multi-Core Processing
will need to create rules that govern when data should                subsection describes the collaboration around features
be considered “aged.” For example, sales data might be                and capabilities at the intra-processor, inter-core level.
considered aged once the order it is associated with has
                                                                    • Within cores. The Optimization for Core-Level Features
been closed for at least 90 days.
                                                                      subsection describes the collaboration around features
3.5 Boosting Performance by Moving Results                            and capabilities at the intra-core level.
    Instead of Source Data                                          The fourth subsection, Performance Benefits from
The data-intensive business applications in common use              Optimization and Hardware Platform Features, presents
often move large, detailed data sets from the data layer            some key performance results associated with the
to the application layer, where it performs operations on           optimizations described in the previous subsections.
that data. This arrangement is not optimal in many cases,
since the result of the calculation is typically smaller than       4.1 Benefits from Multi-Socket Topology and NUMA
the source data needed to derive that result. Therefore, it         Because SAP HANA is a large-scale multi-socket
is generally more efficient to perform calculations before          environment, it benefits substantially from the NUMA
moving the data and then move only the results.                     (non-uniform memory access) shared-memory architecture.
To benefit from that scheme, SAP HANA incorporates                  In multi-processor systems that are not NUMA capable (or
a calculation engine into the data layer. The calculation           where NUMA is not enabled), each processor is connected
engine can be manipulated programmatically by application           directly to all available system memory through the
developers. In this way, high-performance applications based        system bus. The access path between any combination of
on the SAP HANA architecture can delegate data-intense              processor and memory is direct and electrically equivalent
operations to the data layer, achieving higher efficiency and       to any other combination. This arrangement is therefore
performance than would otherwise be possible.                       known as “uniform memory access” (UMA), as shown in
                                                                    Figure 6a. Note that all the processors must contend for
                                                                    bandwidth on the system bus.




                                                                                                                                     9
In NUMA systems (Figure 6b), each processor is connected                          In multi-threaded application software, the OS migrates
     directly to its own local memory, and all of the memory                           threads between processor cores as needed to match
     banks are connected together by the system bus.                                   workloads to available execution resources. Assuming that
     Therefore, any processor has direct, relatively high-speed                        the data associated with a specific thread is already resident
     access to local memory, as well as access to all other                            in memory, migrating a thread to a core in a different
     memory through the system bus. Because bus-bandwidth                              processor package carries with it a performance penalty.
     contention only occurs on non-local memory accesses,                              Therefore, NUMA-capable software must be capable of
     there is a performance advantage to holding the data                              working with the OS to manage the associated balance in
     needed by a given processor in its own local memory; that                         performance implications. For example, there may be an
     placement is known as “processor affinity.”                                       advantage from moving a thread to a processor package
                                                                                       where more execution resources are available that must be
                                                                                       weighed against the disadvantage of moving data across
                                                                                       the system bus.
                 Processor      Processor             Processor       Processor
                                                                                       4.2 Optimization for Multi-Core Processing
                                                                                       To take advantage of the increasing core count in modern
                                                                                       processors (as many as 10 cores supporting 20 threads
                                                                                       in the Intel Xeon processor E7 family), SAPA HANA is
        Memory
                                                                                       engineered for parallel execution that scales well with the
                                            Chipset                                    number of available cores. Specifically, optimization for
       Interface
                                                                                       multi-core platforms accounts for the following two key
                                                                                       considerations:
                                              I/O
                                                                                       • Data is partitioned wherever possible in sections that
                                              (a)
                                                                                         allow calculations to be executed in parallel.
                                                                                       • Sequential processing is avoided, which includes
                                                                                         finding alternatives to approaches such as thread locking.
                                             I/O
                                                                                       4.2.1 Parallel Aggregation
                                      Chipset                                          In the shared-memory architecture within a node, SAP
                                                                                       HANA performs aggregation operations by spawning a
                                                                                       number of threads that act in parallel, each of which has
                                                                                       equal access to the data resident on the memory on that
      Memory                                                                Memory
     Interface          Processor                         Processor        Interface   node. As illustrated at the top of Figure 7, each aggregation
                                                                                       functions in a loop-wise fashion as follows:
                                                                                        1. Fetch a small partition of the input relation.
                                                                                        2. Aggregate that partition.
      Memory                                                                Memory
     Interface          Processor                         Processor        Interface    3. Return to step 1 until the complete input
                                                                                           relation is processed.


                                      Chipset



                                             I/O

                                             (b)

     Figure 6. In NUMA systems, each processor has fast, direct
     access to its own local memory module, reducing the latency
     that arises due to bus-bandwidth contention.




10
Table

                         Aggregation
                          Thread 1


                                                                                                    Aggregation
                                                                                                     Thread 2


                         Aggregation
                          Thread 1
           Local Hash Table 1                                                                              Local Hash Table 1

                                                        Cache-Sized Hash Tables




            Buffer



               Merger                                                                                           Merger
              Thread 1                                                                                         Thread 2




                 Part Hash Table 1                                                                 Part Hash Table 2



Figure 7. SAP HANA employs parallel aggregation to divide among threads the work of fetching, aggregating, and merging data.


Note that the approach of using small chunks and                     As described above in the Performance Implications of
dynamically assigning them to threads distributes the                Column versus Row Storage section, determining what data
computation costs over all threads relatively evenly, even if        will be fetched together in a given cache line is an important
requirements differ substantially among the chunks.                  aspect of optimizing performance within the memory/cache
                                                                     subsystem. Block-wise decompression of data allows
Each thread has a private hash table where it writes its
                                                                     decompressed data to reside in L1 cache for access by
aggregation results. When the aggregation threads are
                                                                     the processor.
finished, the buffered hash tables are merged by merger
threads using range partitioning. Each merger thread                 The columnar design results in consecutive access of large
is responsible for a certain range, which is indicated in            amounts of data, which is an excellent fit for the hardware
Figure 7 by a different shade of grey. The merger threads            prefetchers built into the Intel Xeon processors. The SAP
aggregate their partition into part hash tables, and the final       HANA development team added software prefetching
result is obtained by concatenating the part hash tables.            hints to improve performance in the limited number of
                                                                     circumstances where data was not automatically prefetched
4.2.2 Processor Cache Optimizations                                  (for example, when crossing page boundaries). Similarly, in
Taking best advantage of processor cache resources is                Figure 7, note that the hash tables to which worker threads
another aspect of performance optimization that plays                write aggregation results are sized to the cache, helping to
an important role in the development of SAP HANA.                    reduce the incidence of cache misses in handling that data.
Because accessing data from cache is so much faster than
accessing data in main memory, special care must be taken
to improve the odds that the processor can access the data
it needs at any given time from cache, rather than being
forced to access memory.




                                                                                                                                      11
4.3 Optimization for Core-Level Features                                In large part, achieving the optimal performance benefit
     In addition to the advantages available to SAP HANA at the              from Intel HT Technology is simply an extension of the
     levels of interactions among processors and interactions                optimization that must be done to take advantage of
     among cores, a number of features are also relevant at                  hardware parallelism generally. An in-depth discussion of
     the level of operation within individual cores. Some of                 issues specific to Intel HT Technology is beyond the scope
     those advantages and considerations are discussed in this               of this paper. For more information, see the paper,
     section.                                                                “Performance Insights to Intel® Hyper-Threading Technology.”2
                                                                             To help ensure optimal benefit from Intel HT Technology, the
     4.3.1 Optimization for Intel® HT Technology
                                                                             development team addressed the following considerations:
     Intel HT Technology helps improves performance through
     increased instruction level parallelism by enabling each                • Parallel scaling. Intel HT Technology effectively doubles
     processor core to accommodate two threads with                            the number of software threads required to take full
     independent instruction streams. While the two threads                    advantage of the execution hardware, making SAP
     necessarily share execution resources within the core,                    HANA’s ability to identify and spawn the correct number
     the presence of two threads can increase utilization of the               of threads even more important.
     available execution units. This effect typically increases the          • Thread balance. Obtaining the optimal performance
     number of instructions executed in a given amount of time                 benefit from Intel HT Technology requires the work to be
     within a core, as shown in Figure 8.                                      divided as evenly as possible among software threads.
                                                                               One approach taken with SAP HANA is to assign work in
                               Without       With                              very small chunks, as described above in the discussion
                              Intel® HT    Intel® HT                           of parallel aggregation.
                             Technology   Technology
                                                                             • Avoiding threading bottlenecks. The increased
                                                                               hardware parallelism associated with Intel HT Technology
                                                       Processor Execution
                                                                               carries with it an increased danger from threading
                                                       Units Working on
                                                       Thread 1                bottlenecks, so the team refined SAP HANA’s threading
                                                                               model to avoid issues such as false sharing and excessive
                                                       Processor Execution     locks and synchronization.
       Time (proc. cycles)




                                                       Units Working on
                                                       Thread 2
                                                                             4.3.2 Standards Compliance and Accuracy
                                                                                   for Decimal Floating-Point Math
                                                       Processor Execution
                                                       Units Sitting Idle    The IEEE Standard 754-2008 for Binary Floating-Point
                                                                             Arithmetic defines functions that overcome important
                                                                             limitations in previous versions, most recently Standard
                                                                             754-1985. Specifically, the standard describes multiple
                                                                             encoding methods for the numbers used in the calculations.
                                                                             Under the previous versions of the standard, certain
                                                                             applications for use in regulated industries such as finance
                                                                             were forced to use more resource-intensive encoding
                                                                             to meet accuracy requirements. The accuracy issues
                                                                             were related to complexities in representing decimals
     Figure 8. Intel® Hyper-Threading Technology (Intel® HT Technology)      precisely in binary format. For example, the following
     gives the processor access to two threads in the same time slice,       calculation calculated in single-precision yields the result
     reducing the level of idle hardware resources.
                                                                             6.9999997504, rather than 7.00:3
                                                                               (7.00 / 10000.0) * 10000.0
     This greater efficiency enables higher throughput (since
     more work gets completed per clock cycle) and higher                    The Intel® Decimal Floating-Point Math Library provides an
     performance per watt (since fewer idle execution units                  efficient implementation of binary integer decimal encoding
     consume power without contributing to performance).                     that complies with Standard 754-2008, which enhances
     In addition, when one thread has a cache miss, branch                   the suitability of SAP HANA for use in the full spectrum of
     mispredict, or any other pipeline stall, the other thread               industry settings.
     continues processing instructions at nearly the same
     rate as a single thread running on the core.



12
4.3.3 Benefits from Intel® Turbo Boost Technology
Intel® Turbo Boost Technology4 automatically allows processor cores to run faster than the base operating frequency
when the OS requests the highest processor performance state (P0), if the processor is operating below power, current,
and temperature specification limits. The extent of this capability is also dependent on the number of active cores in the
processor package, as illustrated in Figure 9.

                                                                                                                               10 Cores Operating Under                                                            Fewer than 10 Cores Operating Under
                                      Normal Operation                                                                       Inte® Turbo Boost Technology                                                             Inte Turbo Boost Technology
  Frequency

                CORE 0
                         CORE 1
                                  CORE 2
                                           CORE 3
                                                     CORE 4
                                                              CORE 5
                                                                       CORE 6
                                                                                CORE 7
                                                                                         CORE 8
                                                                                                  CORE 9


                                                                                                           CORE 0
                                                                                                                    CORE 1
                                                                                                                              CORE 2
                                                                                                                                        CORE 3
                                                                                                                                                 CORE 4
                                                                                                                                                          CORE 5
                                                                                                                                                                   CORE 6
                                                                                                                                                                            CORE 7
                                                                                                                                                                                      CORE 8
                                                                                                                                                                                               CORE 9


                                                                                                                                                                                                         CORE 0
                                                                                                                                                                                                                  CORE 1
                         All cores operate at rated frequency                                                                All cores operate at higher frequency                                      Fewer cores may operate at even higher frequency

Figure 9. Intel® Turbo Boost Technology allows individual cores to operate briefly above the thermal design point, under certain conditions.




                                                                                                                    Intel® Instruction Sets

              SSE (1999)                               SSE2 (2000)                                 SSE3 (2004)                         SSSE3 (2006)                                  SSE4.1 (2007)                    SSE4.2 (2008)       Intel® AES-NI
                                                                                                                                                                                                                                              (2010)
         70 Instructions:                           144 Instructions:                             13 Instructions:                     32 Instructions:                              47 Instructions:                  7 Instructions:
    • Single-precision                              • Double-precision                            • Complex                            • Decode                                 • Video                             • String              6 Instructions:
      vectors                                         vectors                                       arithmetic                           operations                               accelerators                        manipulation       • Encryption and
    • Streaming                                     • 128-bit vector                                                                                                            • Graphics                          • Cyclic               decryption
      operations                                      integer                                                                                                                   • Co-processor                        Redundancy
                                                                                                                                                                                  accelerators                        Check (CRC32)


Figure 10. Successive generations of instructions, including Intel® Streaming SIMD Extensions (Intel® SSE) and Intel® Advanced Encryption
Standard New Instructions (Intel® AES-NI) accelerate various functions on Intel® architecture.



4.3.4 Optimization of Bit Compression with                                                                                                                4.4 Performance Benefits from Optimization
      Intel® SSE-Based Vectorization                                                                                                                          and Hardware Platform Features
Intel periodically introduces sets of instructions that extend                                                                                            This section covers the performance increases enabled by
platform capabilities with certain types of operations, as                                                                                                the optimization work on SAP HANA for the Intel processor
illustrated in Figure 10. The value of these instructions is                                                                                              E7 family conducted jointly by SAP and Intel. It also
primarily to accelerate those operations and secondarily to                                                                                               describes the performance benefits that are available from
reduce the complexity of the implementation.                                                                                                              some key hardware-platform features.
Intel SSE instructions provide the ability to support vector
                                                                                                                                                          4.4.1 Performance Comparison between Processor
operations on 128-bit wide registers. Theoretically, these                                                                                                      Generations and Impact of Hardware Features
operations should result in a 4x speed up on 32-bit
                                                                                                                                                          To demonstrate the performance benefits of implementing
data types and a 2x speed up on 64-bit data types.
                                                                                                                                                          SAP HANA on the latest Intel Xeon processors, a test lab
SAP HANA uses this advantage for decompression
                                                                                                                                                          at Intel compared results against a reference workload
and search on its base data structures. In cooperation
                                                                                                                                                          between the Intel® Xeon® processor 7500 series and the
with Intel, SAP engineers have developed a highly
                                                                                                                                                          Intel Xeon processor E7 family, which is summarized
optimized set of algorithms. By default, SAP HANA
                                                                                                                                                          in Figure 11.5 The workload consisted of a single-user
checks on startup to determine which sets of Intel SSE
                                                                                                                                                          scenario that was provided by an SAP customer and used
instructions are supported by the processor and chooses
                                                                                                                                                          as the internal reference scenario for SAP. It processed 3.2
the implementation that is best suited to maximize
                                                                                                                                                          billion records of point-of-sale data in a complex fashion.
decompression and search performance.


                                                                                                                                                                                                                                                            13
SAP HANA™ 1.0 Performance Comparison:
                                                          Intel® Xeon® Processor E7 Family versus Intel® Xeon® Processor 7500 Series
                                                                          and Hardware Feature Performance Impacts


                                                                              97%                                                97%               98%
                                                           3,500,000                           95%
                                                                                                                                                            100%
                                                                                                               92%
                                                                                                                                                            90%
                                                           3,000,000        288,873
                                                                                                                                                            80%
                                                                                             255,658
                                      Running time [ms]    2,500,000                                                                                        70%




                                                                                                                                                                   CPU Utilization
                                                                                                              225,587
                                                                                                                                218,114
                                                                                                                                                  210,471
                                                                                                                                                            60%
                                                           2,000,000                                          1.37x
                                                                                                                                                            50%
                                                           1,500,000
                                                                                                                        1.21x                               40%

                                                           1,000,000                                                                                        30%
                                                                                                                                1.07x
                                                                                                                                                            20%
                                                             500,000
                                                                                                                                          1.04x             10%

                 Running time [ms]                                 0                                                                                        0%
                 CPU Utilization                                            Intel Xeon
                                                                            Processor                       Intel Xeon Processor E7 Family
                                                                           7500 Series

           Intel® Hyper-Treading Technology                                   ON               OFF              ON                ON                ON

           Non-Uniform Memory Access (NUMA)                                   ON               ON               OFF               ON                ON

           Intel® Turbo Boost Technology                                      ON               ON               ON               OFF                ON

     Figure 11. Testing demonstrated a 1.37x performance advantage by the Intel® Xeon® processor E7 family over the Intel® Xeon® processor
     7500 series and benefits from key platform features.5



     The testing summarized in Figure 11 consisted of two major                                      As reported in Figure 11, the impact of the processor
     aspects: testing the performance differential between                                           generation alone was that the Intel Xeon processor E7
     the two processor generations and testing the distinct                                          family delivered a 1.37x performance increase, relative to
     performance impacts of individual hardware-platform                                             the Intel Xeon processor 7500 series.5 On the system based
     features (that is, Intel HT Technology, NUMA, and Intel Turbo                                   on the Intel Xeon processor E7 family, Intel HT Technology
     Boost Technology. The testing approach consisted of two                                         produced a 1.21x performance increase,5 NUMA technology
     primary parts:                                                                                  produced a 1.07x performance increase,5 and Intel Turbo
     • Performance impact of processor generation. A direct                                          Boost Technology produced a 1.04x performance increase.5
       performance comparison was made between the two                                               Notably, these performance increases were measured
       processor generations under test, with all three of the                                       under relatively high levels of processor utilization,
       platform features enabled. This result intended to show                                       demonstrating their value in adding to system-level
       the effect of the processor generation in isolation from                                      performance headroom.
       other aspects of the platform.
                                                                                                     4.4.2 Performance Benefits of Intel® SSE
     • Performance impact of platform features. Each of the                                                Vectorization for Search
       three platform features was disabled on the Intel Xeon                                        SAP HANA ensures flexibility and makes the most of
       processor E7 family-based system, one at a time with                                          installed system memory by compressing all base data
       the other two features enabled, and each of those results                                     structures using a packed bitfields approach for each of the
       was compared to the case in which all three features were                                     bit cases 1 to 32 bits. (For a discussion of compression in
       enabled. This result was used to show the isolated effect                                     the SAP In-Memory Computing Database, see the Creating
       of the platform feature.                                                                      a Transition in Database Design section.) Using the packed
                                                                                                     bitfields approach enables the system to store integer
                                                                                                     values more efficiently than as standard STL vectors.


14
Instead of using 32 bits per integer, SAP HANA uses only as many bits as are needed to represent the highest distinct value
in a vector as a bit string. A separate implementation is required for each bit case, and over the average for all bit cases,
using Intel SSE with TREX accelerates search on the compressed data by a factor of 1.6x relative to scalar code on the
Intel® Xeon® processor 5500 series, as shown in Figure 12.6



                                             2.50
 Speed-up of Intel® SSE versus Scalar Code




                                             2.00



                                             1.50



                                             1.00



                                             0.50



                                             0.00
                                                    1   2   3   4   5   6   7   8   9   10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

                                                                                                     Bit Case

Figure 12. On average over all bit cases, Intel Streaming SIMD Extensions (Intel® SSE) accelerates search on compressed data using
SAP In-Memory Computing Database technology by a factor of 1.6x on the Intel® Xeon® processor 5500 series.6



                                                                                                                                                    SAP Business Warehouse Accelerator
                                                                                                                                                          Core-Frequency Scaling


4.4.3 Performance Scaling with Increasing Core Count                                                                                    0.68
                                                                                                                                                                           Higher is better
To check performance scalability relative to increases in the
                                                                                                                                        0.67
number of processor cores, the team began by calculating
the “ideal” case, in which the number of queries per                                                                                    0.66
                                                                                                        Queries per Second (Compiled)




second would increase linearly on a one-to-one basis with
processor frequency (represented by the upper, dashed                                                                                   0.65
line in Figure 13). Next, by adjusting the core frequency
using BIOS settings, they measured the actual increase                                                                                  0.64
                                                                                                                                                                                         Ideal
(represented by the lower, solid line).                                                                                                                                                  Measured
                                                                                                                                        0.63
The results show that performance increases relative to
increases in processor frequency at an efficiency of 91 to                                                                              0.62
94 percent.7 While these results were obtained using SAP
Business Warehouse Accelerator, a predecessor technology                                                                                0.61
to SAP HANA, note that both software environments are
based on similar in-memory computing technology.                                                                                         0.6
                                                                                                                                            2.65   2.70     2.75    2.80       2.85      2.90   2.95

                                                                                                                                                             Core Frequency (GHz)
                                                                                                                                                          Changed through Frequency
                                                                                                                                                               Multiplier in BIOS


                                                                                                        Figure 13. SAP Business Warehouse Accelerator workloads scale
                                                                                                        almost at the ideal rate as processor core frequency increases.7




                                                                                                                                                                                                       15
5 Enhancing Reliability for In-Memory
       Computing Systems

     5.1 In-Memory Recovery
     SAP HANA includes software capabilities that take                  • Isolation is the requirement that all data associated with
     advantage of the hardware features of Intel Xeon processors         a transaction be inaccessible by any other operation for
     to dramatically reduce the incidence of system failures.            the duration of the transaction. This property is necessary
     In addition, the SAP HANA appliance can fail over from              to prevent multiple writes to a single data location during
     one compute node to another in the event of a processor             transactions, which could lead to data inconsistency
     failure and to handle memory errors with as little impact to        and unpredictable results. Because Isolation requires
     workloads as possible.                                              data to be locked while it is in use for a transaction, SAP
     A growing scope of RAS capabilities have been introduced            HANA uses sophisticated algorithms to minimize the
     in the Intel Xeon processor, reaching a suitability for             performance impact of that locking.
     business-critical implementations that was previously              • Durability means that once a transaction has been
     associated with only more expensive RISC and other                  committed to the database it can always be recovered in
     proprietary solutions. For example, Machine Check
                                                                         the event of a subsequent hardware or software failure.
     Architecture Recovery enables SAP HANA to work with the
                                                                         This property prevents completed transactions from
     OS to recover from many otherwise-uncorrectable memory
                                                                         having to be reversed. Because in-memory databases
     errors, recovering data without downtime in many cases.
                                                                         use volatile memory as their main data store, SAP HANA
     5.2 ACID Properties                                                 relies on the use of flash memory to provide Durability
     SAP HANA has been designed and refined to provide                   by retaining a combination of snapshots and logs that
     excellent compliance with the so-called ACID set of                 together assure the recoverability of committed data in
     properties designed to ensure reliable processing by                the event of a system failure.
     a database system. In general, those properties are
     as follows:                                                        6 Application Development and Tools
                                                                          for In-Memory Computing
     • Atomicity is that property of a database transaction
                                                                        SAP HANA provides a powerful application-development
       that guarantees that if the entire transaction cannot be
       completed, no change is made to the database, meaning            environment specifically tailored to the needs of
       that “partial” transactions will not take place. Atomicity is    organizations in building business applications. It also
       intimately related to the reliability features of the platform   provides SQL Script, a language for processing application-
       described above, in the sense that those characteristics         specific code in the database layer, which has the following
       help prevent the system from failing in mid-transaction          two advantages:
       or help it recover if a failure does occur. In addition to       • Transfer efficiency. Moving calculations to the database
      those capabilities, SAP HANA includes programmatic                 layer eliminates the need to transfer large amounts of data
      safeguards to ensure Atomicity.                                    from the database to the application.
     • Consistency refers to the assurance that all transactions        • Feature utilization. Calculations need to be executed
      performed by the database will be taken from one                   in the database layer to get the maximum benefit
      consistent state to another, which includes the                    from features such as fast column operations, query
      requirement that only valid data will be written to the            optimization, and parallel execution.
      database. In the event that invalid data is written to the
      database in error, the Consistency property requires that
      the database can be rolled back to a last known good
      state. This capability is provided for by the presence of
      temporal tables described previously in this paper, and
      it will be enhanced by the future implementation of aged
      data features also discussed.




16
Hana Intel SAP Whitepaper
Hana Intel SAP Whitepaper
Hana Intel SAP Whitepaper
Hana Intel SAP Whitepaper

More Related Content

What's hot

Suse linux enterprise_server_12_x_for_sap_applications_configuration_guide_fo...
Suse linux enterprise_server_12_x_for_sap_applications_configuration_guide_fo...Suse linux enterprise_server_12_x_for_sap_applications_configuration_guide_fo...
Suse linux enterprise_server_12_x_for_sap_applications_configuration_guide_fo...Jaleel Ahmed Gulammohiddin
 
Cloud advantage for sap hana
Cloud advantage for sap hanaCloud advantage for sap hana
Cloud advantage for sap hanapatmataz
 
Getting Started with the IBM PowerLinux Solution Edition for SAP Applications
Getting Started with the IBM PowerLinux Solution Edition for SAP ApplicationsGetting Started with the IBM PowerLinux Solution Edition for SAP Applications
Getting Started with the IBM PowerLinux Solution Edition for SAP ApplicationsIBM India Smarter Computing
 
Bi sample sap learn book
Bi sample sap learn bookBi sample sap learn book
Bi sample sap learn bookcj12chi
 
Intel® Virtualization Technology & Parallels Bring Native Graphics Innovation...
Intel® Virtualization Technology & Parallels Bring Native Graphics Innovation...Intel® Virtualization Technology & Parallels Bring Native Graphics Innovation...
Intel® Virtualization Technology & Parallels Bring Native Graphics Innovation...James Price
 
Intel® Xeon® Processor 5500 Series
Intel® Xeon® Processor 5500 SeriesIntel® Xeon® Processor 5500 Series
Intel® Xeon® Processor 5500 SeriesJames Price
 
The Next Generation of Intel: The Dawn of Nehalem
The Next Generation of Intel: The Dawn of NehalemThe Next Generation of Intel: The Dawn of Nehalem
The Next Generation of Intel: The Dawn of NehalemJames Price
 
Performance tuning in sap bi 7.0
Performance tuning in sap bi 7.0Performance tuning in sap bi 7.0
Performance tuning in sap bi 7.0gireesho
 
Rise with sap s 4 hana cloud, private edition service description guide
Rise with sap s 4 hana cloud, private edition service description guideRise with sap s 4 hana cloud, private edition service description guide
Rise with sap s 4 hana cloud, private edition service description guideDharma Atluri
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesFinalyear Projects
 
C8 Whats New In Versions 3 And 4
C8   Whats New In Versions 3 And 4C8   Whats New In Versions 3 And 4
C8 Whats New In Versions 3 And 4dfwcug
 
Database Comparison & Synch | Change Manager Success Story
Database Comparison & Synch | Change Manager Success StoryDatabase Comparison & Synch | Change Manager Success Story
Database Comparison & Synch | Change Manager Success StoryEmbarcadero Technologies
 
Disaster Recovery, Local Operational Recovery, and High Availability
Disaster Recovery, Local Operational Recovery, and High AvailabilityDisaster Recovery, Local Operational Recovery, and High Availability
Disaster Recovery, Local Operational Recovery, and High Availabilityxmeteorite
 
Printing Press Benefits From Windows 7 Professional upgrade - Case Study
Printing Press Benefits From Windows 7 Professional upgrade - Case StudyPrinting Press Benefits From Windows 7 Professional upgrade - Case Study
Printing Press Benefits From Windows 7 Professional upgrade - Case StudyWindows 7 Professional
 
Sap hana master_guide_en
Sap hana master_guide_enSap hana master_guide_en
Sap hana master_guide_enFarrukh Yusupov
 

What's hot (18)

Suse linux enterprise_server_12_x_for_sap_applications_configuration_guide_fo...
Suse linux enterprise_server_12_x_for_sap_applications_configuration_guide_fo...Suse linux enterprise_server_12_x_for_sap_applications_configuration_guide_fo...
Suse linux enterprise_server_12_x_for_sap_applications_configuration_guide_fo...
 
Cloud advantage for sap hana
Cloud advantage for sap hanaCloud advantage for sap hana
Cloud advantage for sap hana
 
Getting Started with the IBM PowerLinux Solution Edition for SAP Applications
Getting Started with the IBM PowerLinux Solution Edition for SAP ApplicationsGetting Started with the IBM PowerLinux Solution Edition for SAP Applications
Getting Started with the IBM PowerLinux Solution Edition for SAP Applications
 
Bi sample sap learn book
Bi sample sap learn bookBi sample sap learn book
Bi sample sap learn book
 
Intel® Virtualization Technology & Parallels Bring Native Graphics Innovation...
Intel® Virtualization Technology & Parallels Bring Native Graphics Innovation...Intel® Virtualization Technology & Parallels Bring Native Graphics Innovation...
Intel® Virtualization Technology & Parallels Bring Native Graphics Innovation...
 
Intel® Xeon® Processor 5500 Series
Intel® Xeon® Processor 5500 SeriesIntel® Xeon® Processor 5500 Series
Intel® Xeon® Processor 5500 Series
 
The Next Generation of Intel: The Dawn of Nehalem
The Next Generation of Intel: The Dawn of NehalemThe Next Generation of Intel: The Dawn of Nehalem
The Next Generation of Intel: The Dawn of Nehalem
 
Performance tuning in sap bi 7.0
Performance tuning in sap bi 7.0Performance tuning in sap bi 7.0
Performance tuning in sap bi 7.0
 
Rise with sap s 4 hana cloud, private edition service description guide
Rise with sap s 4 hana cloud, private edition service description guideRise with sap s 4 hana cloud, private edition service description guide
Rise with sap s 4 hana cloud, private edition service description guide
 
Conv op2020
Conv op2020Conv op2020
Conv op2020
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehouses
 
C8 Whats New In Versions 3 And 4
C8   Whats New In Versions 3 And 4C8   Whats New In Versions 3 And 4
C8 Whats New In Versions 3 And 4
 
Determine your sizing requirements
Determine your sizing requirementsDetermine your sizing requirements
Determine your sizing requirements
 
Database Comparison & Synch | Change Manager Success Story
Database Comparison & Synch | Change Manager Success StoryDatabase Comparison & Synch | Change Manager Success Story
Database Comparison & Synch | Change Manager Success Story
 
Disaster Recovery, Local Operational Recovery, and High Availability
Disaster Recovery, Local Operational Recovery, and High AvailabilityDisaster Recovery, Local Operational Recovery, and High Availability
Disaster Recovery, Local Operational Recovery, and High Availability
 
Printing Press Benefits From Windows 7 Professional upgrade - Case Study
Printing Press Benefits From Windows 7 Professional upgrade - Case StudyPrinting Press Benefits From Windows 7 Professional upgrade - Case Study
Printing Press Benefits From Windows 7 Professional upgrade - Case Study
 
Sap hana master_guide_en
Sap hana master_guide_enSap hana master_guide_en
Sap hana master_guide_en
 
Sap hana master_guide_en
Sap hana master_guide_enSap hana master_guide_en
Sap hana master_guide_en
 

Similar to Hana Intel SAP Whitepaper

Transforming Business with Intel and SAP HANA 2
Transforming Business with Intel and SAP HANA 2 Transforming Business with Intel and SAP HANA 2
Transforming Business with Intel and SAP HANA 2 PT Datacomm Diangraha
 
Fujitsu World Tour 2017 - Digital Business with SAP & Fujitsu
Fujitsu World Tour 2017 - Digital Business with SAP & FujitsuFujitsu World Tour 2017 - Digital Business with SAP & Fujitsu
Fujitsu World Tour 2017 - Digital Business with SAP & FujitsuFujitsu India
 
Bringing The Memory Revolution to Your Intelligent Enterpise
Bringing The Memory Revolution to Your Intelligent EnterpiseBringing The Memory Revolution to Your Intelligent Enterpise
Bringing The Memory Revolution to Your Intelligent EnterpisePT Datacomm Diangraha
 
Hana Training Day 1
Hana Training Day 1Hana Training Day 1
Hana Training Day 1mishra4927
 
HANA overview
HANA overviewHANA overview
HANA overviewjenkin
 
Technical white paper--Optimizing Quality of Service with SAP HANAon Power Ra...
Technical white paper--Optimizing Quality of Service with SAP HANAon Power Ra...Technical white paper--Optimizing Quality of Service with SAP HANAon Power Ra...
Technical white paper--Optimizing Quality of Service with SAP HANAon Power Ra...Krystel Hery
 
SAP Strategie und Innovation
SAP Strategie und InnovationSAP Strategie und Innovation
SAP Strategie und InnovationIBM Switzerland
 
SAP HANA – A Technical Snapshot
SAP HANA – A Technical SnapshotSAP HANA – A Technical Snapshot
SAP HANA – A Technical SnapshotDebajit Banerjee
 
Intel® Xeon® processor E7-8800/4800 v3 Application Showcase
Intel® Xeon® processor E7-8800/4800 v3 Application ShowcaseIntel® Xeon® processor E7-8800/4800 v3 Application Showcase
Intel® Xeon® processor E7-8800/4800 v3 Application ShowcaseIntel IT Center
 
INTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORSINTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORSTyrone Systems
 
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Intel® Software
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANASAP Technology
 
SAP HANA on IBM Power Systems
SAP HANA on IBM Power SystemsSAP HANA on IBM Power Systems
SAP HANA on IBM Power SystemsthinkASG
 
Hana To Go Presentation Final With Demo Screen Shots Nov8
Hana To Go Presentation Final With Demo Screen Shots Nov8Hana To Go Presentation Final With Demo Screen Shots Nov8
Hana To Go Presentation Final With Demo Screen Shots Nov8Doug Berry
 

Similar to Hana Intel SAP Whitepaper (20)

Transforming Business with Intel and SAP HANA 2
Transforming Business with Intel and SAP HANA 2 Transforming Business with Intel and SAP HANA 2
Transforming Business with Intel and SAP HANA 2
 
Fujitsu World Tour 2017 - Digital Business with SAP & Fujitsu
Fujitsu World Tour 2017 - Digital Business with SAP & FujitsuFujitsu World Tour 2017 - Digital Business with SAP & Fujitsu
Fujitsu World Tour 2017 - Digital Business with SAP & Fujitsu
 
Bringing The Memory Revolution to Your Intelligent Enterpise
Bringing The Memory Revolution to Your Intelligent EnterpiseBringing The Memory Revolution to Your Intelligent Enterpise
Bringing The Memory Revolution to Your Intelligent Enterpise
 
Hana Training Day 1
Hana Training Day 1Hana Training Day 1
Hana Training Day 1
 
HANA overview
HANA overviewHANA overview
HANA overview
 
Realtech case_study
Realtech case_studyRealtech case_study
Realtech case_study
 
SAP HANA - Understanding the Basics
SAP HANA - Understanding the Basics SAP HANA - Understanding the Basics
SAP HANA - Understanding the Basics
 
Technical white paper--Optimizing Quality of Service with SAP HANAon Power Ra...
Technical white paper--Optimizing Quality of Service with SAP HANAon Power Ra...Technical white paper--Optimizing Quality of Service with SAP HANAon Power Ra...
Technical white paper--Optimizing Quality of Service with SAP HANAon Power Ra...
 
Pow03190 usen
Pow03190 usenPow03190 usen
Pow03190 usen
 
SAP Strategie und Innovation
SAP Strategie und InnovationSAP Strategie und Innovation
SAP Strategie und Innovation
 
SAP HANA – A Technical Snapshot
SAP HANA – A Technical SnapshotSAP HANA – A Technical Snapshot
SAP HANA – A Technical Snapshot
 
SAP HANA Overview
SAP HANA OverviewSAP HANA Overview
SAP HANA Overview
 
Intel® Xeon® processor E7-8800/4800 v3 Application Showcase
Intel® Xeon® processor E7-8800/4800 v3 Application ShowcaseIntel® Xeon® processor E7-8800/4800 v3 Application Showcase
Intel® Xeon® processor E7-8800/4800 v3 Application Showcase
 
INTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORSINTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORS
 
Hana Offerings Engl
Hana Offerings EnglHana Offerings Engl
Hana Offerings Engl
 
HANA SITSP 2011
HANA SITSP 2011HANA SITSP 2011
HANA SITSP 2011
 
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
 
SAP HANA on IBM Power Systems
SAP HANA on IBM Power SystemsSAP HANA on IBM Power Systems
SAP HANA on IBM Power Systems
 
Hana To Go Presentation Final With Demo Screen Shots Nov8
Hana To Go Presentation Final With Demo Screen Shots Nov8Hana To Go Presentation Final With Demo Screen Shots Nov8
Hana To Go Presentation Final With Demo Screen Shots Nov8
 

More from ugur candan

SAP AI What are examples Oct2022
SAP AI  What are examples Oct2022SAP AI  What are examples Oct2022
SAP AI What are examples Oct2022ugur candan
 
CEO Agenda 2019 by Ugur Candan
CEO Agenda 2019 by Ugur CandanCEO Agenda 2019 by Ugur Candan
CEO Agenda 2019 by Ugur Candanugur candan
 
Digital transformation and SAP
Digital transformation and SAPDigital transformation and SAP
Digital transformation and SAPugur candan
 
Digital Enterprise Transformsation and SAP
Digital Enterprise Transformsation and SAPDigital Enterprise Transformsation and SAP
Digital Enterprise Transformsation and SAPugur candan
 
MOONSHOTS for in-memory computing
MOONSHOTS for in-memory computingMOONSHOTS for in-memory computing
MOONSHOTS for in-memory computingugur candan
 
WHY SAP Real Time Data Platform - RTDP
WHY SAP Real Time Data Platform - RTDPWHY SAP Real Time Data Platform - RTDP
WHY SAP Real Time Data Platform - RTDPugur candan
 
Opening Analytics Networking Event
Opening Analytics Networking EventOpening Analytics Networking Event
Opening Analytics Networking Eventugur candan
 
Sap innovation forum istanbul 2012
Sap innovation forum istanbul 2012Sap innovation forum istanbul 2012
Sap innovation forum istanbul 2012ugur candan
 
İş Zekasının Değişen Kuralları
İş Zekasının Değişen Kurallarıİş Zekasının Değişen Kuralları
İş Zekasının Değişen Kurallarıugur candan
 
Gamification of eEducation
Gamification of eEducationGamification of eEducation
Gamification of eEducationugur candan
 
The End of an Architectural Era Michael Stonebraker
The End of an Architectural Era Michael StonebrakerThe End of an Architectural Era Michael Stonebraker
The End of an Architectural Era Michael Stonebrakerugur candan
 
The Berkeley View on the Parallel Computing Landscape
The Berkeley View on the Parallel Computing LandscapeThe Berkeley View on the Parallel Computing Landscape
The Berkeley View on the Parallel Computing Landscapeugur candan
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wallugur candan
 
Exadata is still oracle
Exadata is still oracleExadata is still oracle
Exadata is still oracleugur candan
 
Gerçek Gerçek Zamanlı Mimari
Gerçek Gerçek Zamanlı MimariGerçek Gerçek Zamanlı Mimari
Gerçek Gerçek Zamanlı Mimariugur candan
 
Michael stonebraker mit session
Michael stonebraker mit sessionMichael stonebraker mit session
Michael stonebraker mit sessionugur candan
 
Introduction to HANA in-memory from SAP
Introduction to HANA in-memory from SAPIntroduction to HANA in-memory from SAP
Introduction to HANA in-memory from SAPugur candan
 
Complex Event Prosessing
Complex Event ProsessingComplex Event Prosessing
Complex Event Prosessingugur candan
 

More from ugur candan (20)

SAP AI What are examples Oct2022
SAP AI  What are examples Oct2022SAP AI  What are examples Oct2022
SAP AI What are examples Oct2022
 
CEO Agenda 2019 by Ugur Candan
CEO Agenda 2019 by Ugur CandanCEO Agenda 2019 by Ugur Candan
CEO Agenda 2019 by Ugur Candan
 
Digital transformation and SAP
Digital transformation and SAPDigital transformation and SAP
Digital transformation and SAP
 
Digital Enterprise Transformsation and SAP
Digital Enterprise Transformsation and SAPDigital Enterprise Transformsation and SAP
Digital Enterprise Transformsation and SAP
 
MOONSHOTS for in-memory computing
MOONSHOTS for in-memory computingMOONSHOTS for in-memory computing
MOONSHOTS for in-memory computing
 
WHY SAP Real Time Data Platform - RTDP
WHY SAP Real Time Data Platform - RTDPWHY SAP Real Time Data Platform - RTDP
WHY SAP Real Time Data Platform - RTDP
 
Opening Analytics Networking Event
Opening Analytics Networking EventOpening Analytics Networking Event
Opening Analytics Networking Event
 
Sap innovation forum istanbul 2012
Sap innovation forum istanbul 2012Sap innovation forum istanbul 2012
Sap innovation forum istanbul 2012
 
İş Zekasının Değişen Kuralları
İş Zekasının Değişen Kurallarıİş Zekasının Değişen Kuralları
İş Zekasının Değişen Kuralları
 
Gamification of eEducation
Gamification of eEducationGamification of eEducation
Gamification of eEducation
 
Why sap hana
Why sap hanaWhy sap hana
Why sap hana
 
The End of an Architectural Era Michael Stonebraker
The End of an Architectural Era Michael StonebrakerThe End of an Architectural Era Michael Stonebraker
The End of an Architectural Era Michael Stonebraker
 
Ramcloud
RamcloudRamcloud
Ramcloud
 
The Berkeley View on the Parallel Computing Landscape
The Berkeley View on the Parallel Computing LandscapeThe Berkeley View on the Parallel Computing Landscape
The Berkeley View on the Parallel Computing Landscape
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
 
Exadata is still oracle
Exadata is still oracleExadata is still oracle
Exadata is still oracle
 
Gerçek Gerçek Zamanlı Mimari
Gerçek Gerçek Zamanlı MimariGerçek Gerçek Zamanlı Mimari
Gerçek Gerçek Zamanlı Mimari
 
Michael stonebraker mit session
Michael stonebraker mit sessionMichael stonebraker mit session
Michael stonebraker mit session
 
Introduction to HANA in-memory from SAP
Introduction to HANA in-memory from SAPIntroduction to HANA in-memory from SAP
Introduction to HANA in-memory from SAP
 
Complex Event Prosessing
Complex Event ProsessingComplex Event Prosessing
Complex Event Prosessing
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Hana Intel SAP Whitepaper

  • 1. Analyzing Business as It Happens SAP In-Memory Appliance Software (SAP HANA™) runs on the Intel® Xeon® processor to generate superior, real-time business intelligence. WHITE PAPER Intel® Xeon® Processor E7 Family SAP In-Memory Appliance Software (SAP HANA™) April 2011 | Version 1.0 1 Introduction Increasingly sophisticated business decision models SAP In-Memory Appliance Software (SAP HANA™) delivers depend on extremely fast access to and manipulation of next-generation in-memory computing through an ongoing massive data stores. Insight into business operations often engineering collaboration between SAP and Intel to provide demands data volumes that are beyond the capabilities of optimized performance and reliability on Intel® architecture, traditional disk-based systems to process them in real time, including the Intel® Xeon® processor E7 family. which can limit access to benefits such as the following: This paper introduces SAP In-Memory Computing • Efficiency provided by the ability to respond in technology concepts, explains how SAP HANA uses them real time to the changing needs of the business to help deliver real-time business insight, and describes how synergies with Intel architecture-based server • Flexibility based on insight that accurately directs platforms drive solution value. The discussion begins with quick action an examination of the hardware and software innovations • Empowerment of business users to make and that set the stage for SAP In-Memory Computing, including act on smart decisions the ways in which the architectures of Intel Xeon processors This reality often causes a separation between ongoing and SAP HANA complement one another. business needs and the analytic applications that support Next, the paper describes specific optimizations made as them. In such cases, the lag time between data gathering part of the ongoing, collaborative engineering relationship and interpretation delays the ability to benefit from business between Intel and SAP. In the context of general database information, thereby limiting its value. concepts, the paper illustrates how SAP HANA takes advantage of Intel architecture to deliver high performance and reliability, as well as excellent support for application developers working with SAP HANA. Next, the paper describes real-world customer benefits derived from SAP SAP In-Memory Appliance Software (SAP HANA implementations. Finally, the paper explores how HANA™) uses SAP In-Memory Computing future technology may drive even further the value of joint technology and the performance and solutions based on Intel Xeon processors and SAP HANA. reliability of the Intel Xeon processor E7 ® ® family to dramatically improve the speed of data analysis, enabling business intelligence to be generated in real time. That capability increases the value of business data with more sophisticated decision support than has been possible with solutions based on traditional disk-based systems.
  • 2. 1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Intel and SAP Technology Advances Drive 2 Intel and SAP Technology Advances Database Evolution Drive Database Evolution . . . . . . . . . . . . . . . . . . . . 2 The evolution of Intel architecture toward greater performance 2.1 Intel Hardware Technology Innovations . . . . . . . . . . . 2 at lower cost sets the stage for the advances of SAP HANA. 2.1.1 The Rise of Mainstream Server Parallelism . . . . . . . . 2 The SAP In-Memory Computing Database that lies at the heart 2.1.2 The Drive to 64-Bit Processing . . . . . . . . . . . . . . 3 of this technology is engineered to take advantage of each 2.1.3 Other Architecture Features that Benefit SAP HANA . . . 4 aspect of a high-performance, balanced compute platform: 2.2 SAP Software Technology Innovations . . . . . . . . . . . 5 2.2.1 Establishing a Solid Database • Memory. Increasingly large amounts of addressable for In-Memory Computing . . . . . . . . . . . . . . . . . 5 memory, running at ever-higher speeds, combine with 2.2.2 Delivering the SAP In-Memory Computing growing caches and other improvements to create Database into the Mainstream . . . . . . . . . . . . . . . 5 2.2.3 SAP HANA: The Next Generation memory subsystems robust enough to easily support of Real-Time Business Analysis . . . . . . . . . . . . . . 5 the demands of SAP In-Memory Computing. 3 Creating a Transition in Database Design . . . . . . . . . . . 6 • Processing. Constant architectural improvements to 3.1 Performance Implications of Column versus Row Storage . . . . . . . . . . . . . . . . . . . . . 7 complete more work per clock cycle extend the value 3.2 Dictionary Compression with Column Storage . . . . . . . 8 of growing parallelism from multi-core designs, Intel® 3.3 Efficiencies from Separation of Main and Delta Storage. . 8 Hyper-Threading Technology (Intel® HT Technology), 3.4 Storage of and Queries against Historical Data . . . . . . 9 and increasingly scalable symmetric multi-processing. 3.5 Boosting Performance by Moving • I/O. Very fast on-board data transfer using Intel® Results Instead of Source Data . . . . . . . . . . . . . . . 9 QuickPath Interconnect is complemented by advances 4 Collaboration by SAP and Intel for Performance Optimization and Parallelization . . . . . . . . . . . . . . . . 9 in inter-node communication, to enable the low latency 4.1 Benefits from Multi-Socket Topology and NUMA . . . . . . 9 in data transfer within and between nodes in the SAP 4.2 Optimization for Multi-Core Processing. . . . . . . . . . 10 HANA appliance. 4.2.1 Parallel Aggregation . . . . . . . . . . . . . . . . . . . 10 4.2.2 Processor Cache Optimizations . . . . . . . . . . . . . 11 2.1 Intel Hardware Technology Innovations 4.3 Optimization for Core-Level Features . . . . . . . . . . . 12 Intel has continually led the industry in driving advances 4.3.1 Optimization for Intel HT Technology . . . . . . . . . . 12 ® in commercial off-the-shelf server platforms that increase 4.3.2 Standards Compliance and Accuracy performance, scalability, and value. These innovations for Decimal Floating-Point Math . . . . . . . . . . . . . 12 directly enable SAP HANA to transcend barriers to what 4.3.3 Benefits from Intel® Turbo Boost Technology . . . . . . 13 businesses are able to do with their data, generating better 4.3.4 Optimization of Bit Compression with Intel® SSE-Based Vectorization . . . . . . . . . . . . . 13 business decisions in response to real-time events. 4.4 Performance Benefits from Optimization This section traces the history of Intel processor innovation, and Hardware Platform Features . . . . . . . . . . . . . 13 setting the stage for an understanding of how SAP HANA 4.4.1 Performance Comparison between Processor Generations and Impact of Hardware Features . . . . . 13 appliances based on Intel architecture produce synergies 4.4.2 Performance Benefits of Intel® SSE between the hardware and software that benefit business Vectorization for Search . . . . . . . . . . . . . . . . . 14 customers. 4.4.3 Performance Scaling with Increasing Core Count . . . . 15 5 Enhancing Reliability for In-Memory Computing Systems . . . . . . . . . . . . . . . . . . . . . . 16 5.1 In-Memory Recovery . . . . . . . . . . . . . . . . . . . . 16 “Intel and SAP engineering teams have 5.2 ACID Properties . . . . . . . . . . . . . . . . . . . . . . 16 collaborated extensively on SAP HANA™ 6 Application Development and Tools and SAP In-Memory Computing technology. for In-Memory Computing. . . . . . . . . . . . . . . . . . . 16 6.1 Intel® Performance Counter Monitor for Comparing the in-memory system performance Debugging and Optimization . . . . . . . . . . . . . . . . 17 with a classical relational database, the query 6.2 Intel® Software Development Products . . . . . . . . . . 17 time was reduced from 77 minutes to 13 seconds 7 Real Customer Example: Building Richer Sales Support at Hilti Corporation . . . . 17 when run on the Intel® Xeon® processor 7500 8 Outlook and Future Research Topics . . . . . . . . . . . . 18 series. We are thrilled with this result, and it 8.1 Evolution of Non-Volatile Memory . . . . . . . . . . . . . 18 showcases once again the powerful impact of 8.2 Hardware Acceleration . . . . . . . . . . . . . . . . . . . 18 joint problem solving between our two companies.” 8.3 New Instructions . . . . . . . . . . . . . . . . . . . . . . 18 8.4 Many Cores . . . . . . . . . . . . . . . . . . . . . . . . . 18 - K. Skaugen, Intel Vice President and 9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 General Manager of the Data Center Group 10 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . 19 2
  • 3. 2.1.1 The Rise of Mainstream Server Parallelism One key solution has been to produce multi-core At one time, boosts in processor performance were created processors, so that each processor socket contains largely by ongoing increases in clock speed. The simple multiple, individual execution engines—a design concept fact of providing more cycles per second retired more taken from mainframe and other large-scale systems, instructions per unit of time, allowing software to become where it had been employed for some time. Even before more capable and deliver faster response times. As clock introducing its multi-core designs, Intel developed Intel HT speeds grew, however, processor designs faced limitations Technology, which allows a single processing core to handle due to growing energy consumption and heat generation. two strings of instructions (software threads) at a time, making better use of on-die resources that would otherwise be idle. Together with symmetric multi-processing (multiple processors and multiple servers working together), multi- Intel and SAP: A History of Co-Innovation core processing and Intel HT Technology have made parallel For more than 10 years, Intel and SAP have worked programming the de facto industry standard. Some of the together to deliver industry-leading performance techniques used by SAP HANA to take advantage of parallel of SAP solutions on Intel® architecture, and a large hardware resources are discussed later in this paper. proportion of new SAP implementations are now 2.1.2 The Drive to 64-Bit Processing deployed on Intel® platforms. The latest success from that tradition of co-innovation is available to Another limitation that enterprise computing systems have faced is that the 32-bit systems in general enterprise use customers of all sizes in SAP In-Memory Appliance until the mid-2000s were limited by their ability to address Software (SAP HANA™), which is delivered on the no more than approximately four gigabytes (GB) of system Intel® Xeon® processor. memory per server. Beginning in 2004 with the Intel Xeon The relationship between Intel and SAP has processor formerly code-named Nocona, Intel began become even stronger over the years, growing to shipping hardware capable of running 64-bit OSs, which include a broad set of collaborations and initiatives. can address far larger amounts of memory. The theoretical Some of the most visible include limits of memory capacity are as follows: the following: 32-bit systems: 232 = 4,294,967,296 bytes, or ~4 GB • Joint roadmap enablement. Early in the design 64-bit systems: 264 = 18,446,744,073,709,551,616 bytes, process, Intel and SAP decision makers identify or ~18 billion GB (18 Exabytes) complementary features and capabilities in their In reality, the upper limits of memory addressability for upcoming products, and those insights help to 64-bit systems are governed by the physical capabilities direct the development cycle for maximum value. of the platform and to some extent by the cost of the memory itself. As platforms become more capable, system • Collaborative product optimization. Intel memory capacity continues to grow, and at the same time, engineers located on-site at SAP work with their the cost of memory on a per-gigabyte basis continues to SAP counterparts to provide tuning expertise that drop. Server memory capacities are now often measured in enables SAP HANA and other software solutions terabytes (TB), where 1 TB = 1,000 GB. to take advantage of the latest hardware features. In addition to being able to address more memory than their • Combined research efforts. Together, 32-bit ancestors, platforms based on 64-bit architectures researchers from Intel and SAP continually make more and wider registers available to the compiler explore and drive the future of business (theoretically enabling the compiler to generate faster code). computing. Belfast Collaboratory, an excellent They also have the benefit of some larger data types, a example, opened in October 2009 to innovate crucial factor for SAP HANA, where gigabytes of memory around cloud computing and the next generation are handled within a single process. In fact, the port to of sustainable IT. 64-bit architecture was one of the first joint efforts by Intel and SAP related to SAP NetWeaver ® Business Warehouse As a result of these efforts, customer solutions Accelerator (BWA, formerly known as SAP NetWeaver achieve performance, scalability, reliability, and Business Intelligence Accelerator), a precursor in many energy efficiency that translate into favorable ways to SAP HANA. ROI and TCO, for increased business value. 3
  • 4. 2.1.3 Other Architecture Features that Benefit SAP HANA Successive generations of Intel® server platforms introduce Intel Xeon processor E7 family as an example, SAP HANA an enormous range of other architectural features that benefits from Double Device Data Correction (DDDC), which drive new platform capabilities. For example, the Intel Xeon allows recovery from two simultaneous memory errors, processor E7 family adds the following new features and supporting greater availability of SAP HANA systems. capabilities relative to its predecessor that enhance the Platform innovation also includes the introduction of new scalable performance of SAP HANA implementations: instructions such as the series of Intel® Streaming SIMD • Increased core count, from 8 cores (16 threads) to 10 Extensions instruction sets (Intel® SSE, Intel® SSE2, Intel® cores (20 threads) per socket, provides more computation SSE3, Intel® SSSE3, Intel® SSE4.1, and Intel® SSE4.2). These resources for SAP HANA’s data manipulation demands. instructions are designed to accelerate specific types of • Larger memory capacity, from 16 GB DIMMs (1 TB for operations, and their impact on SAP HANA is discussed in a four-way server) to 32 GB DIMMs (2 TB for a four-way further detail later in this paper. server), increases the ability of SAP HANA to hold large Finally, Intel continues to innovate around process- amounts of data for in-memory computation. technology shrink in its manufacturing operations. Moving • Larger cache, from 24 megabytes (MB) to 30 MB, to a smaller process technology allows the creation of increases the amount of data SAP HANA can hold for smaller transistors and other on-chip features, which allows fast access without reading from DRAM. more of those features to be placed in a given amount Likewise, evolving server platforms provide new features for of on-die area, for faster switching, lower per-transistor security, and reliability, availability, and serviceability (RAS) energy consumption, and lower per-transistor waste-heat that support business-critical SAP HANA deployments. production. Figure 1 shows some key advances in the Again using features and capabilities introduced with the development of the Intel Xeon processor E7 family. 2010: 2011: Intel® Xeon® Processor 7500 Series Intel® Xeon® Processor E7 Family C C C C C C C C C C C C C C C C C C Figure 1. The Intel® Xeon® processor E7 O O O O O O O O O O O O O O O O O O family adds features and capabilities that R R R R R R R R R R R R R R R R R R directly benefit SAP HANA deployments. E E E E E E E E E E E E E E E E E E 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 10 24 MB Shared Last Level Cache 30 MB Shared Last Level Cache Integrated Memory QPI Link Integrated Memory QPI Link Controller Controller Controller Controller Support for 16 GB Four Intel® Support for 32 GB Four Intel® DDR3 DIMMs QuickPath DDR3 DIMMs QuickPath (1 TB RAM for a Interconnects at (2 TB RAM for a Interconnects at four-socket system) up to 6.4 GT/s four-socket system) up to 6.4 GT/s 45nm process 32nm process technology technology • Intel® Turbo Boost Technology • Enhanced Intel® Turbo Boost Technology • Intel® Intelligent Power Technology • Intel® Intelligent Power Technology • Intel® Hyper-Threading Technology • Intel® Hyper-Threading Technology • Machine Check Architecture Recovery • Machine Check Architecture Recovery • Intel® Scalable Memory Interconnect Failover • Intel® Scalable Memory Interconnect Failover • Intel® QuickPath Interconnect Self-Healing • Intel® QuickPath Interconnect Self-Healing • Intel® Trusted Execution Technology • Intel® AES New Instructions • Double Device Data Correction (DDDC) • Partial Memory Mirroring 4
  • 5. 2.2 SAP Software Technology Innovations • A calculation and planning engine with a data Around 2000, SAP recognized the trends toward repository, providing persistent views of business rapidly-growing data sets owned by many companies information and a unified information modeling and increasing business needs for real-time analytics. design environment In response to those realities, SAP engineers began to • A data management service that includes SQL re-imagine how data warehouses could handle these and MDX interfaces, data integration capabilities for massive data stores to generate immediate, ad hoc analysis. accessing SAP and non-SAP data sources, and Disk-access times had become a primary bottleneck in integrated life cycle management. addressing those needs using conventional database queries. Research showed promise for removing that factor by conducting queries solely in memory, without having to SAP In-Memory Computing Database access the physical disk. SAP In-Memory Computing Technology Two key factors helped make in-memory computing viable Calculation and for mainstream implementations. As discussed above, by Row Column Planning Engine Store Store the middle of the first decade of the 21st century, system memory had begun to fall sharply in cost, and 64-bit servers had opened the door to commercial off-the-shelf hardware Data Management Service that could address very large amounts of system memory. Those advances set the stage for SAP’s implementation of Figure 2. SAP In-Memory Computing supports complex in-memory computing for general business use. calculations on massive amounts of data. When combined, these capabilities allow the SAP “For over 10 years, SAP and Intel have In-Memory Computing Database to help decision makers collaborated to create visionary solutions that explore and analyze vast amounts of information at enable customers to gain competitive advantage incredible response times and with a high degree of flexibility, without the need for IT involvement. and run better. The SAP In-Memory Appliance (SAP HANA™), delivered on the Intel® Xeon® 2.2.2 Delivering the SAP In-Memory Computing processor, provides customers with real- Database into the Mainstream time results from analyses and transactions SAP first made SAP In-Memory Computing available in that enable better decisions and improved product form with the introduction of TREX (Text Retrieval operations. It’s just the latest example of how and information EXtraction), a search engine component game-changing innovation between industry used as an integral part of various SAP software offerings. Building on the early success of products such as SAP leaders can alter how enterprises do business NetWeaver Enterprise Search, the company introduced and succeed.” SAP BWA in 2005. - Stefan Sigg, Senior Vice President, The company’s ongoing focus on business intelligence (BI) In-Memory Platform, SAP solutions led it to integrate SAP In-Memory Computing into SAP BusinessObjects to create an accelerated BI solution called the SAP BusinessObjects Explorer, Accelerated 2.2.1 Establishing a Solid Database Version. SAP In-Memory Computing has also been used for In-Memory Computing to enable real-time analysis in a variety of other products, In-memory computing allows the processing of massive including the following: quantities of real-time data in the main memory of the • SAP Advanced Planner and Optimizer with liveCache server, providing immediate results from analyses and transactions. The SAP In-Memory Computing Database • SAP CRM Customer Segmentation (Figure 2) delivers the following capabilities: • SAP Business ByDesign™ Analytics • In-memory computing functionality with native support for row and columnar data stores (discussed below), providing full ACID (atomicity, consistency, isolation, durability) transactional capabilities 5
  • 6. 2.2.3 SAP HANA: The Next Generation of Real-Time SAP HANA benefits dramatically from the design features Business Analysis and benefits of the latest Intel Xeon processors, helping it SAP In-Memory Computing has evolved to create SAP deliver excellent value to end-customers. The performance, HANA, a multi-purpose appliance that combines SAP scalability, reliability, and energy efficiency of the hardware software components installed on servers based on the platform are particularly beneficial in this regard. High memory Intel Xeon processor. The integrated software stack is highly capacity and innovations in the memory and cache subsystems, optimized for performance and reliability on the hardware, as described above, support excellent results from the demands and it includes the SAP In-Memory Computing Database, of in-memory computing. The sophisticated parallel real-time replication service, data modeling, and data software design, as described below, takes advantage services, as illustrated in Figure 3. of the platform’s large-scale hardware parallelism. SAP HANA is engineered specifically to be data-agnostic, Other Applications SAP Business Objects and the appliance is designed to be implemented without affecting existing applications or systems. The solution MDK SQL BICS provides a flexible, cost-effective, real-time approach for managing large data volumes, allowing organizations to SAP NetWeaver® SAP In-Memory Computing Database Business Warehouse dramatically reduce the hardware and maintenance costs In-Memory Calculation and associated with running multiple data warehouses and Third-Party Computing Planning Engine analytical systems. Data Sources In the future, SAP HANA will act as the technology foundation SAP Business Suite Data Management Service for new, innovative business applications for planning, forecasting, operational performance, and simulation. The Admin and Data Modeling first business application running on SAP HANA is the SAP BusinessObjects Strategic Workforce Planning, version Figure 3. SAP HANA Real-Time Replication Services provides data-agnostic for in-memory computing, a solution SAP developed that integration with other Data Services dramatically benefits from high-performance and real-time business systems. access to large volumes of transactional information. 3 Creating a Transition in Database Design With the growth of data stores and business application complexity, solution providers have steadily developed The Intel and SAP HANA™ Value Proposition novel approaches to optimizing performance. One key Across a Wide Variety of Business Units development has been the separation of online transactional SAP In-Memory Appliance Software (SAP HANA™) processing (OLTP) and online analytical processing (OLAP) systems because both these types of relational databases and the Intel® Xeon® processor enable a broad set have somewhat divergent design goals. In the context of BI of scenarios within various parts of the business, solutions, they can be considered as follows: above and beyond transactional excellence and • OLTP systems instantly record business events as they traditional analytics, such as the following: happen, such as the sale of a piece of inventory. As such, • Order management. Available to promise, system designs focus on quickly handling large numbers price optimization of small, simultaneous transactions. • Supply chain. Transportation planning, inventory • OLAP systems provide analysis of the data provided by management, demand and OLTP systems to support business decisions. They are supply planning designed to handle a relatively small number of often- • Manufacturing. Production planning, performance complex transactions. management, asset utilization In his paper, “A Common Database Approach for OLTP • Sales and marketing. Analytics, customer and OLAP Using an In-Memory Column Database,”1 SAP segmentation, trade promotion management co-founder Hasso Plattner asserts the value of unifying both OLTP and OLAP in a single system based on column • Corporate finance. Planning and storage and in-memory computation. While the concept of budgeting, consolidation combining the two systems is beyond the scope of this paper, • Human resources. Workforce analytics Plattner’s argument for the value of column storage and • Information technology. Landscape optimization in-memory computing to both OLTP and OLAP has direct bearing on the capabilities of SAP HANA in those areas. 6
  • 7. 3.1 Performance Implications of Column versus Columnar storage may be more efficient in the common Row Storage case where a given column includes only a relatively Whereas relational databases typically use row-based small number of distinct values. For example, if the table storage, column-based storage may be more suitable shown in Figure 4 includes a very large number of rows, in some cases. As shown in Figure 4, a database table one would expect that each entry in the Country column is conceptually a two-dimensional structure made up of would nevertheless consist of one of perhaps a few dozen cells arranged in rows and columns. Because computer possible entries. That factor allows for very efficient data memory is structured linearly, there are two options for compression, which produces a number of advantages. the sequences of cell values that are stored in contiguous For example, data compression allows for faster loading memory locations: from system memory into the processor cache. Since • In row storage, the data sequence consists that transport tends to be the main factor limiting system of the data fields in one table row. performance, speeding it up may be expected to produce • In column storage, the data sequence consists a performance advantage, even considering the added of the entries in one table column. overhead of decompressing the data. Another example of the advantages of columnar storage is related to Figure 4 illustrates both options. compression: Computing the sum of values in a column is much faster if many additions of the same value can Table be replaced by a single multiplication. Similarly, holding values from a column together in cache is crucial for bulk Country Product Sales processing using Intel SSE instructions. US Alpha 3.000 Columnar storage also helps reduce the number of US Beta 1.250 performance-limiting cache misses. Recognizing that JP Alpha 700 data is transferred from main memory to the processor in UK Alpha 450 blocks of fixed size called cache lines, consider the case of aggregating the sum of all of the values in the Sales column Row Store Column Store of a large version of the table in Figure 4. With row storage, only a minority of the data in any given cache line would US US be from the Sales column and therefore be relevant to the Row 1 Alpha US Country calculation. The large proportion of irrelevant data would 3,000 JP result in cache misses, where the processor would have US UK to discard the irrelevant data and wait for useful data in its Row 2 Beta Alpha place. Conversely, with columnar storage, every piece of 1.250 Beta Product data in each cache line would be relevant to the calculation, JP Alpha increasing efficiency by keeping the processor supplied Row 3 Alpha Alpha with useful data. 700 3.000 UK 1.250 Sales Row 4 Alpha 700 450 450 Figure 4. Unlike traditional databases, SAP HANA offers the choice between row-based and column-based data storage for each table. 7
  • 8. 3.2 Dictionary Compression with Column Storage 3.3 Efficiencies from Separation of Main Dictionary coding is a significant benefit when compressing and Delta Storage column-stored data, as represented in Figure 5. Each As changes are committed into the database, they are distinct column value is assigned a sequential integer stored in delta storage, which is distinct from main storage, (beginning with zero) as a Value ID. An alphanumerically and if new data contains column values that are not sorted dictionary stores all of the distinct column values, included in the main dictionary, they are stored in a delta and the Value IDs are implicitly given by the sequential dictionary. Because the delta dictionary is created without positions of each distinct column value. The actual column recoding the Value IDs in the main dictionary, there is no data is stored as a sequence of Value IDs, which are bit global order that encompasses the Value IDs in both the coded, meaning that they are stored using the minimum delta dictionary and the main dictionary. possible number of bits. Column ”Name“ Column ”Name“ (dictionary compressed) (uncompressed) Value-ID sequence Dictionary One element for each row in column Miller 4 0 Baker Jones 1 1 Jones Millman 5 Point into dictionary 2 John Sorted Zsuwalski N 3 Johnson Value IDs Baker 0 4 Miller Miller 4 5 Millman John 2 . Miller 4 .. Johnson 3 N Zsuwalski Jones 1 . .. ... Value ID implicitly given by sequence in Value which values are stored Figure 5. Dictionary coding can enable significant compression advantages in column-stored data sets. Dictionary compression of columnar data is most efficient The delta store supports “greater than” and “less than” when the number of distinct column values is relatively queries by adding a CSB+-Tree to the delta dictionary. This small, compared to the size of the column (that is, when tree holds the values of the delta store ordered as in the a relatively low number of Value IDs exists relative to the main store, only the amount of memory needed is higher number of column values). than in the main store case. To establish a single, sorted dictionary that includes all Value IDs when a delta dictionary Because the dictionary is sorted, search operations and has been established, the delta dictionary and main comparisons such as “greater than” and “less than” can dictionary must be merged using a delta merge operation. be performed against the value IDs to return results that To enable even higher compression of main storage, SAP are meaningful with regard to the actual column values. HANA selects and implements the optimal compression Because those operations are conducted on integer values, method for the dictionary itself during delta merge. they can be much faster than, for example, performing them against long strings. 8
  • 9. 3.4 Storage of and Queries against Historical Data 4 Collaboration by SAP and Intel for Performance As an optional feature, SAP HANA can use temporal tables Optimization and Parallelization to query against an historical state of the data. Write Co-innovation and collaboration between SAP and Intel is operations on temporal tables do not physically overwrite an important prerequisite for the success of the SAP HANA existing records. Instead, write operations always insert new software components on Intel architecture-based hardware. versions of the data record into the database. The different The optimization draws both from industry best practices versions of a data record have time stamps indicating their and from solution-specific engineering, drawing broadly validity, with current data being simply the most recent version. on the expertise of performance engineers from both companies. As mentioned previously in this paper, these Historical (non-current) data stored in a temporal table individuals have a long history of working together, and their is still considered to be operational data, meaning that work on SAP HANA represents the culmination of many it is expected to be used in current operations, such as years’ experience. transactions or reporting. By contrast, future versions of SAP HANA plan to support a category specifically for data This four-part section identifies some of the specific that is not expected to be used anymore, known as aged optimizations the team has made to the SAP HANA platform data, which no longer need to be stored in memory. Instead, and the resulting performance benefits on Intel Xeon aged data will be eligible to be written to disk and released processor-based systems. The first three subsections from memory. describe the optimizations, organized for convenience at decreasing scale within the system: Note that while aged data would not be accessed during normal operations, it would still be available if needed. To • Among sockets. The Benefits from Multi-Socket reduce access time, systems could use solid-state drives Topology and NUMA subsection describes the for this purpose. Alternatively, aged data could be written to collaboration around features and capabilities at the inter- flash memory, which can be accessed even more quickly. processor level. For each individual type of data, application developers • Among cores. The Optimization for Multi-Core Processing will need to create rules that govern when data should subsection describes the collaboration around features be considered “aged.” For example, sales data might be and capabilities at the intra-processor, inter-core level. considered aged once the order it is associated with has • Within cores. The Optimization for Core-Level Features been closed for at least 90 days. subsection describes the collaboration around features 3.5 Boosting Performance by Moving Results and capabilities at the intra-core level. Instead of Source Data The fourth subsection, Performance Benefits from The data-intensive business applications in common use Optimization and Hardware Platform Features, presents often move large, detailed data sets from the data layer some key performance results associated with the to the application layer, where it performs operations on optimizations described in the previous subsections. that data. This arrangement is not optimal in many cases, since the result of the calculation is typically smaller than 4.1 Benefits from Multi-Socket Topology and NUMA the source data needed to derive that result. Therefore, it Because SAP HANA is a large-scale multi-socket is generally more efficient to perform calculations before environment, it benefits substantially from the NUMA moving the data and then move only the results. (non-uniform memory access) shared-memory architecture. To benefit from that scheme, SAP HANA incorporates In multi-processor systems that are not NUMA capable (or a calculation engine into the data layer. The calculation where NUMA is not enabled), each processor is connected engine can be manipulated programmatically by application directly to all available system memory through the developers. In this way, high-performance applications based system bus. The access path between any combination of on the SAP HANA architecture can delegate data-intense processor and memory is direct and electrically equivalent operations to the data layer, achieving higher efficiency and to any other combination. This arrangement is therefore performance than would otherwise be possible. known as “uniform memory access” (UMA), as shown in Figure 6a. Note that all the processors must contend for bandwidth on the system bus. 9
  • 10. In NUMA systems (Figure 6b), each processor is connected In multi-threaded application software, the OS migrates directly to its own local memory, and all of the memory threads between processor cores as needed to match banks are connected together by the system bus. workloads to available execution resources. Assuming that Therefore, any processor has direct, relatively high-speed the data associated with a specific thread is already resident access to local memory, as well as access to all other in memory, migrating a thread to a core in a different memory through the system bus. Because bus-bandwidth processor package carries with it a performance penalty. contention only occurs on non-local memory accesses, Therefore, NUMA-capable software must be capable of there is a performance advantage to holding the data working with the OS to manage the associated balance in needed by a given processor in its own local memory; that performance implications. For example, there may be an placement is known as “processor affinity.” advantage from moving a thread to a processor package where more execution resources are available that must be weighed against the disadvantage of moving data across the system bus. Processor Processor Processor Processor 4.2 Optimization for Multi-Core Processing To take advantage of the increasing core count in modern processors (as many as 10 cores supporting 20 threads in the Intel Xeon processor E7 family), SAPA HANA is Memory engineered for parallel execution that scales well with the Chipset number of available cores. Specifically, optimization for Interface multi-core platforms accounts for the following two key considerations: I/O • Data is partitioned wherever possible in sections that (a) allow calculations to be executed in parallel. • Sequential processing is avoided, which includes finding alternatives to approaches such as thread locking. I/O 4.2.1 Parallel Aggregation Chipset In the shared-memory architecture within a node, SAP HANA performs aggregation operations by spawning a number of threads that act in parallel, each of which has equal access to the data resident on the memory on that Memory Memory Interface Processor Processor Interface node. As illustrated at the top of Figure 7, each aggregation functions in a loop-wise fashion as follows: 1. Fetch a small partition of the input relation. 2. Aggregate that partition. Memory Memory Interface Processor Processor Interface 3. Return to step 1 until the complete input relation is processed. Chipset I/O (b) Figure 6. In NUMA systems, each processor has fast, direct access to its own local memory module, reducing the latency that arises due to bus-bandwidth contention. 10
  • 11. Table Aggregation Thread 1 Aggregation Thread 2 Aggregation Thread 1 Local Hash Table 1 Local Hash Table 1 Cache-Sized Hash Tables Buffer Merger Merger Thread 1 Thread 2 Part Hash Table 1 Part Hash Table 2 Figure 7. SAP HANA employs parallel aggregation to divide among threads the work of fetching, aggregating, and merging data. Note that the approach of using small chunks and As described above in the Performance Implications of dynamically assigning them to threads distributes the Column versus Row Storage section, determining what data computation costs over all threads relatively evenly, even if will be fetched together in a given cache line is an important requirements differ substantially among the chunks. aspect of optimizing performance within the memory/cache subsystem. Block-wise decompression of data allows Each thread has a private hash table where it writes its decompressed data to reside in L1 cache for access by aggregation results. When the aggregation threads are the processor. finished, the buffered hash tables are merged by merger threads using range partitioning. Each merger thread The columnar design results in consecutive access of large is responsible for a certain range, which is indicated in amounts of data, which is an excellent fit for the hardware Figure 7 by a different shade of grey. The merger threads prefetchers built into the Intel Xeon processors. The SAP aggregate their partition into part hash tables, and the final HANA development team added software prefetching result is obtained by concatenating the part hash tables. hints to improve performance in the limited number of circumstances where data was not automatically prefetched 4.2.2 Processor Cache Optimizations (for example, when crossing page boundaries). Similarly, in Taking best advantage of processor cache resources is Figure 7, note that the hash tables to which worker threads another aspect of performance optimization that plays write aggregation results are sized to the cache, helping to an important role in the development of SAP HANA. reduce the incidence of cache misses in handling that data. Because accessing data from cache is so much faster than accessing data in main memory, special care must be taken to improve the odds that the processor can access the data it needs at any given time from cache, rather than being forced to access memory. 11
  • 12. 4.3 Optimization for Core-Level Features In large part, achieving the optimal performance benefit In addition to the advantages available to SAP HANA at the from Intel HT Technology is simply an extension of the levels of interactions among processors and interactions optimization that must be done to take advantage of among cores, a number of features are also relevant at hardware parallelism generally. An in-depth discussion of the level of operation within individual cores. Some of issues specific to Intel HT Technology is beyond the scope those advantages and considerations are discussed in this of this paper. For more information, see the paper, section. “Performance Insights to Intel® Hyper-Threading Technology.”2 To help ensure optimal benefit from Intel HT Technology, the 4.3.1 Optimization for Intel® HT Technology development team addressed the following considerations: Intel HT Technology helps improves performance through increased instruction level parallelism by enabling each • Parallel scaling. Intel HT Technology effectively doubles processor core to accommodate two threads with the number of software threads required to take full independent instruction streams. While the two threads advantage of the execution hardware, making SAP necessarily share execution resources within the core, HANA’s ability to identify and spawn the correct number the presence of two threads can increase utilization of the of threads even more important. available execution units. This effect typically increases the • Thread balance. Obtaining the optimal performance number of instructions executed in a given amount of time benefit from Intel HT Technology requires the work to be within a core, as shown in Figure 8. divided as evenly as possible among software threads. One approach taken with SAP HANA is to assign work in Without With very small chunks, as described above in the discussion Intel® HT Intel® HT of parallel aggregation. Technology Technology • Avoiding threading bottlenecks. The increased hardware parallelism associated with Intel HT Technology Processor Execution carries with it an increased danger from threading Units Working on Thread 1 bottlenecks, so the team refined SAP HANA’s threading model to avoid issues such as false sharing and excessive Processor Execution locks and synchronization. Time (proc. cycles) Units Working on Thread 2 4.3.2 Standards Compliance and Accuracy for Decimal Floating-Point Math Processor Execution Units Sitting Idle The IEEE Standard 754-2008 for Binary Floating-Point Arithmetic defines functions that overcome important limitations in previous versions, most recently Standard 754-1985. Specifically, the standard describes multiple encoding methods for the numbers used in the calculations. Under the previous versions of the standard, certain applications for use in regulated industries such as finance were forced to use more resource-intensive encoding to meet accuracy requirements. The accuracy issues were related to complexities in representing decimals Figure 8. Intel® Hyper-Threading Technology (Intel® HT Technology) precisely in binary format. For example, the following gives the processor access to two threads in the same time slice, calculation calculated in single-precision yields the result reducing the level of idle hardware resources. 6.9999997504, rather than 7.00:3 (7.00 / 10000.0) * 10000.0 This greater efficiency enables higher throughput (since more work gets completed per clock cycle) and higher The Intel® Decimal Floating-Point Math Library provides an performance per watt (since fewer idle execution units efficient implementation of binary integer decimal encoding consume power without contributing to performance). that complies with Standard 754-2008, which enhances In addition, when one thread has a cache miss, branch the suitability of SAP HANA for use in the full spectrum of mispredict, or any other pipeline stall, the other thread industry settings. continues processing instructions at nearly the same rate as a single thread running on the core. 12
  • 13. 4.3.3 Benefits from Intel® Turbo Boost Technology Intel® Turbo Boost Technology4 automatically allows processor cores to run faster than the base operating frequency when the OS requests the highest processor performance state (P0), if the processor is operating below power, current, and temperature specification limits. The extent of this capability is also dependent on the number of active cores in the processor package, as illustrated in Figure 9. 10 Cores Operating Under Fewer than 10 Cores Operating Under Normal Operation Inte® Turbo Boost Technology Inte Turbo Boost Technology Frequency CORE 0 CORE 1 CORE 2 CORE 3 CORE 4 CORE 5 CORE 6 CORE 7 CORE 8 CORE 9 CORE 0 CORE 1 CORE 2 CORE 3 CORE 4 CORE 5 CORE 6 CORE 7 CORE 8 CORE 9 CORE 0 CORE 1 All cores operate at rated frequency All cores operate at higher frequency Fewer cores may operate at even higher frequency Figure 9. Intel® Turbo Boost Technology allows individual cores to operate briefly above the thermal design point, under certain conditions. Intel® Instruction Sets SSE (1999) SSE2 (2000) SSE3 (2004) SSSE3 (2006) SSE4.1 (2007) SSE4.2 (2008) Intel® AES-NI (2010) 70 Instructions: 144 Instructions: 13 Instructions: 32 Instructions: 47 Instructions: 7 Instructions: • Single-precision • Double-precision • Complex • Decode • Video • String 6 Instructions: vectors vectors arithmetic operations accelerators manipulation • Encryption and • Streaming • 128-bit vector • Graphics • Cyclic decryption operations integer • Co-processor Redundancy accelerators Check (CRC32) Figure 10. Successive generations of instructions, including Intel® Streaming SIMD Extensions (Intel® SSE) and Intel® Advanced Encryption Standard New Instructions (Intel® AES-NI) accelerate various functions on Intel® architecture. 4.3.4 Optimization of Bit Compression with 4.4 Performance Benefits from Optimization Intel® SSE-Based Vectorization and Hardware Platform Features Intel periodically introduces sets of instructions that extend This section covers the performance increases enabled by platform capabilities with certain types of operations, as the optimization work on SAP HANA for the Intel processor illustrated in Figure 10. The value of these instructions is E7 family conducted jointly by SAP and Intel. It also primarily to accelerate those operations and secondarily to describes the performance benefits that are available from reduce the complexity of the implementation. some key hardware-platform features. Intel SSE instructions provide the ability to support vector 4.4.1 Performance Comparison between Processor operations on 128-bit wide registers. Theoretically, these Generations and Impact of Hardware Features operations should result in a 4x speed up on 32-bit To demonstrate the performance benefits of implementing data types and a 2x speed up on 64-bit data types. SAP HANA on the latest Intel Xeon processors, a test lab SAP HANA uses this advantage for decompression at Intel compared results against a reference workload and search on its base data structures. In cooperation between the Intel® Xeon® processor 7500 series and the with Intel, SAP engineers have developed a highly Intel Xeon processor E7 family, which is summarized optimized set of algorithms. By default, SAP HANA in Figure 11.5 The workload consisted of a single-user checks on startup to determine which sets of Intel SSE scenario that was provided by an SAP customer and used instructions are supported by the processor and chooses as the internal reference scenario for SAP. It processed 3.2 the implementation that is best suited to maximize billion records of point-of-sale data in a complex fashion. decompression and search performance. 13
  • 14. SAP HANA™ 1.0 Performance Comparison: Intel® Xeon® Processor E7 Family versus Intel® Xeon® Processor 7500 Series and Hardware Feature Performance Impacts 97% 97% 98% 3,500,000 95% 100% 92% 90% 3,000,000 288,873 80% 255,658 Running time [ms] 2,500,000 70% CPU Utilization 225,587 218,114 210,471 60% 2,000,000 1.37x 50% 1,500,000 1.21x 40% 1,000,000 30% 1.07x 20% 500,000 1.04x 10% Running time [ms] 0 0% CPU Utilization Intel Xeon Processor Intel Xeon Processor E7 Family 7500 Series Intel® Hyper-Treading Technology ON OFF ON ON ON Non-Uniform Memory Access (NUMA) ON ON OFF ON ON Intel® Turbo Boost Technology ON ON ON OFF ON Figure 11. Testing demonstrated a 1.37x performance advantage by the Intel® Xeon® processor E7 family over the Intel® Xeon® processor 7500 series and benefits from key platform features.5 The testing summarized in Figure 11 consisted of two major As reported in Figure 11, the impact of the processor aspects: testing the performance differential between generation alone was that the Intel Xeon processor E7 the two processor generations and testing the distinct family delivered a 1.37x performance increase, relative to performance impacts of individual hardware-platform the Intel Xeon processor 7500 series.5 On the system based features (that is, Intel HT Technology, NUMA, and Intel Turbo on the Intel Xeon processor E7 family, Intel HT Technology Boost Technology. The testing approach consisted of two produced a 1.21x performance increase,5 NUMA technology primary parts: produced a 1.07x performance increase,5 and Intel Turbo • Performance impact of processor generation. A direct Boost Technology produced a 1.04x performance increase.5 performance comparison was made between the two Notably, these performance increases were measured processor generations under test, with all three of the under relatively high levels of processor utilization, platform features enabled. This result intended to show demonstrating their value in adding to system-level the effect of the processor generation in isolation from performance headroom. other aspects of the platform. 4.4.2 Performance Benefits of Intel® SSE • Performance impact of platform features. Each of the Vectorization for Search three platform features was disabled on the Intel Xeon SAP HANA ensures flexibility and makes the most of processor E7 family-based system, one at a time with installed system memory by compressing all base data the other two features enabled, and each of those results structures using a packed bitfields approach for each of the was compared to the case in which all three features were bit cases 1 to 32 bits. (For a discussion of compression in enabled. This result was used to show the isolated effect the SAP In-Memory Computing Database, see the Creating of the platform feature. a Transition in Database Design section.) Using the packed bitfields approach enables the system to store integer values more efficiently than as standard STL vectors. 14
  • 15. Instead of using 32 bits per integer, SAP HANA uses only as many bits as are needed to represent the highest distinct value in a vector as a bit string. A separate implementation is required for each bit case, and over the average for all bit cases, using Intel SSE with TREX accelerates search on the compressed data by a factor of 1.6x relative to scalar code on the Intel® Xeon® processor 5500 series, as shown in Figure 12.6 2.50 Speed-up of Intel® SSE versus Scalar Code 2.00 1.50 1.00 0.50 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bit Case Figure 12. On average over all bit cases, Intel Streaming SIMD Extensions (Intel® SSE) accelerates search on compressed data using SAP In-Memory Computing Database technology by a factor of 1.6x on the Intel® Xeon® processor 5500 series.6 SAP Business Warehouse Accelerator Core-Frequency Scaling 4.4.3 Performance Scaling with Increasing Core Count 0.68 Higher is better To check performance scalability relative to increases in the 0.67 number of processor cores, the team began by calculating the “ideal” case, in which the number of queries per 0.66 Queries per Second (Compiled) second would increase linearly on a one-to-one basis with processor frequency (represented by the upper, dashed 0.65 line in Figure 13). Next, by adjusting the core frequency using BIOS settings, they measured the actual increase 0.64 Ideal (represented by the lower, solid line). Measured 0.63 The results show that performance increases relative to increases in processor frequency at an efficiency of 91 to 0.62 94 percent.7 While these results were obtained using SAP Business Warehouse Accelerator, a predecessor technology 0.61 to SAP HANA, note that both software environments are based on similar in-memory computing technology. 0.6 2.65 2.70 2.75 2.80 2.85 2.90 2.95 Core Frequency (GHz) Changed through Frequency Multiplier in BIOS Figure 13. SAP Business Warehouse Accelerator workloads scale almost at the ideal rate as processor core frequency increases.7 15
  • 16. 5 Enhancing Reliability for In-Memory Computing Systems 5.1 In-Memory Recovery SAP HANA includes software capabilities that take • Isolation is the requirement that all data associated with advantage of the hardware features of Intel Xeon processors a transaction be inaccessible by any other operation for to dramatically reduce the incidence of system failures. the duration of the transaction. This property is necessary In addition, the SAP HANA appliance can fail over from to prevent multiple writes to a single data location during one compute node to another in the event of a processor transactions, which could lead to data inconsistency failure and to handle memory errors with as little impact to and unpredictable results. Because Isolation requires workloads as possible. data to be locked while it is in use for a transaction, SAP A growing scope of RAS capabilities have been introduced HANA uses sophisticated algorithms to minimize the in the Intel Xeon processor, reaching a suitability for performance impact of that locking. business-critical implementations that was previously • Durability means that once a transaction has been associated with only more expensive RISC and other committed to the database it can always be recovered in proprietary solutions. For example, Machine Check the event of a subsequent hardware or software failure. Architecture Recovery enables SAP HANA to work with the This property prevents completed transactions from OS to recover from many otherwise-uncorrectable memory having to be reversed. Because in-memory databases errors, recovering data without downtime in many cases. use volatile memory as their main data store, SAP HANA 5.2 ACID Properties relies on the use of flash memory to provide Durability SAP HANA has been designed and refined to provide by retaining a combination of snapshots and logs that excellent compliance with the so-called ACID set of together assure the recoverability of committed data in properties designed to ensure reliable processing by the event of a system failure. a database system. In general, those properties are as follows: 6 Application Development and Tools for In-Memory Computing • Atomicity is that property of a database transaction SAP HANA provides a powerful application-development that guarantees that if the entire transaction cannot be completed, no change is made to the database, meaning environment specifically tailored to the needs of that “partial” transactions will not take place. Atomicity is organizations in building business applications. It also intimately related to the reliability features of the platform provides SQL Script, a language for processing application- described above, in the sense that those characteristics specific code in the database layer, which has the following help prevent the system from failing in mid-transaction two advantages: or help it recover if a failure does occur. In addition to • Transfer efficiency. Moving calculations to the database those capabilities, SAP HANA includes programmatic layer eliminates the need to transfer large amounts of data safeguards to ensure Atomicity. from the database to the application. • Consistency refers to the assurance that all transactions • Feature utilization. Calculations need to be executed performed by the database will be taken from one in the database layer to get the maximum benefit consistent state to another, which includes the from features such as fast column operations, query requirement that only valid data will be written to the optimization, and parallel execution. database. In the event that invalid data is written to the database in error, the Consistency property requires that the database can be rolled back to a last known good state. This capability is provided for by the presence of temporal tables described previously in this paper, and it will be enhanced by the future implementation of aged data features also discussed. 16