HANA Performance Efficient Speed and Scale-out for Real-time BI


Published on

Learn about the HANA Performance Efficient Speed and Scale-out for Real-time BI. This paper will describe the test environment and present and analyze the test results. SAP HANA enables organizations to optimize their business operations by analyzing large amounts of data in real time. HANA runs on inexpensive, commodity hardware and requires no proprietary add-on components. It achieves very high performance without requiring any tuning. For more information on SAP HANA, visit http://ibm.co/1brCGOt.

Visit the official Scribd Channel of IBM India Smarter Computing at http://bit.ly/VwO86R to get access to more documents.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

HANA Performance Efficient Speed and Scale-out for Real-time BI

  1. 1. 1HANA PerformanceEfficient Speed and Scale-out for Real-time BI
  2. 2. 2HANA Performance:Efficient Speed and Scale-out for Real-time BIIntroductionSAP HANA enables organizations to optimize their business operations by analyzinglarge amounts of data in real time. HANA runs on inexpensive, commodity hardwareand requires no proprietary add-on components. It achieves very high performancewithout requiring any tuning.A 100TB performance test was developed to demonstrate that HANA is extremelyefficient and scalable and can very simply deliver break-through analytic performancefor real-time business intelligence (BI) on a very large database that is representative ofthe data that businesses use to analyze their operations. A 100TB1data set wasgenerated in a format as would be extracted from an SAP ERP system (e.g., datarecords with multiple fields) for analysis in an SAP NetWeaver BW system.2This paper will describe the test environment and present and analyze the test results.The Test EnvironmentThe performance test environment was developed to represent a Sales & Distribution(SD) BI query environment that supports a broad range of SQL queries and businessusers.Database SchemaThe star-schema design in Figure 1 shows the SD test data environment and eachtable’s cardinality. No manual tuning structures were used in this design; there were noindexes, materialized views, summary tables or other redundant structures added forthe purposes of achieving faster query performance.1100TB = raw data set size before compression.2These tests did not involve BW.
  3. 3. 3Figure 1 - SD SchemaTest DataThe test data consists of one large fact table and several smaller dimension tables asseen in Figure 1. There are 100 billion records in the fact table alone, representing 5years’ worth of SD data. The data is hash-partitioned equally across the 16 nodesusing Customer_ID. Within a node, the data is further partitioned into one-monthintervals. This results in 60 partitions per node and approximately 104 million recordsper partition. (See also Figure 2 and “Loading”.)System ConfigurationThe test system configuration (Figure 2) is a 16-node cluster of IBM X5 servers with 8TBof total RAM priced at approximately $500,000. Each server has: 4 CPUs with 10 cores and 2 hyper-threads per core, totaling- 40 cores- 80 hyper-threads 512 GB of RAM 3.3 TB of disk storage
  4. 4. 4Figure 2 - Test ConfigurationSetupThese performance tests did not use pre-caching of results or manual tuning structuresof any kind and therefore validated HANA’s load-then-query ability. A design that iscompletely free of tuning structures (internal or external) is important for building asustainable real-time BI environment; it not only speeds implementation, but it alsoprovides ongoing flexibility for ad hoc querying, while eliminating the maintenance costthat tuning structures require.LoadingThe loading was done in parallel using HANA’s IMPORT Command which is a singleSQL statement that names the file to load. Loading is automatically parallelized acrossall of the nodes and uses the distribution and partitioning scheme defined for eachtable3(in this case, hash distribution and monthly range partitions). The load rate wasmeasured at 16 million records per minute, or one million records per minute per node.This load performance is sufficient to load 100 million records (representing onebusiness day’s activity) in just six minutes.CompressionData compression occurs during the data loading process. HANA demonstrated agreater than 20X compression rate4; the 100TB SD data set was reduced to a trim3As defined with the CREATE TABLE command.4Compression rates depend heavily on characteristics of the actual data to be compressed. Individualresults may vary.
  5. 5. 53.78TB HANA database, consuming only 236GBs of RAM on each node in the cluster(see Figure 2). The large fact table accounts for 85 of the 100TBs in the data set, andwas compressed to occupy less than half of the available RAM per node.QueriesThe query suite totals 20 distinct SQL queries including 11 base queries plus a numberof variations for time interval (month, quarter, etc.). The queries were chosen torepresent a mixed workload environment running on data that is representative of itsnative form (i.e., not indexed, tuned or otherwise de-normalized to avoid joins, as wouldbe customary in a conventional database system). They cover a range of BI activitiesfrom departmental to enterprise-level including general reporting, iterative querying (drill downs), ranking, and year-over-year analysis,The resulting queries range from moderately complex to very complex, including SQLconstructs such as multiple joins, in-list, fact-to-fact joins, sub queries, correlated sub queries (CSQ), and union all.The queries are grouped into three general categories of BI usage: Reporting – calculations on business performance measures for ranges ofmaterials and/or customers for a time period Drilldown – iterative user-initiated queries to gather details for a given individualor group of materials and/or customers Analytical – periodic deep historical analysis across customers and/or materialsTable 1 documents each query in terms of its business description, SQL constructs,ranges and qualifiers, and time period variations. If a query is described with multipletime periods, each time period is run as a distinct query; for instance, query R1 is run asfour distinct queries, covering monthly, quarterly, semi-annual, and annual time periods,to show its performance relative to changes in data volume (annual data is12X thevolume of monthly data). Items in bold denote query complexity factors.
  6. 6. 6BIQueryBusiness Description SQL Constructs Ranges & Qualifiers Time Period(s)R1Sales and distribution report by monthfor:a) given range of materialsb) given range of customersStar-schema query with multiplejoins to dimensions, with groupby and order by Range of materials Range of customersFour variations:1. One month;2. Three months;3. Six months;4. Twelve monthsR2Sales and distribution report by month forgiven list of customersStar-schema query with multiplejoins to dimensions, with groupby and order byRange of customers definedusing “in list”One monthR3Sales and distribution report by monthfor:a) given range of materialsb) given customer countryStar-schema query with multiplejoins to dimensions, group byand order by Range of materials Customer countryOne monthR4Top 100 customers report for:a) one or several materials groupsb) one or several customer countriesStar-schema query with multiplejoins to dimensions, group byand order by One or several materialgroups One or several customercountriesOne month of data for twovariations:1. One country and oneproduct group;2. Three countries andtwo product groupsR5Top 100 customers report for:a) given material groupb) given customer countryStar-schema query with multiplejoins to dimensions, group byand order by Material group Customer countryTwo variations:1. Three months;2. Six monthsD1Sales and distribution report for:a) single customer;b) given material groupStar-schema query with multiplejoins to dimensions, group byand order by Single customer Material groupOne monthD2Sales and distribution report for:a) single customer;b) single materialStar-schema query with multiplejoins to dimensions, group byand order by Single material Single customerOne monthD3Sales and distribution report for:a) single materialb) given customer countryStar-schema query with multiplejoins to dimensions, group byand order by Single material Customer countryOne monthD4Sales and distribution report for:a) single customer;b) given range of materialsStar-schema query with multiplejoins to dimensions, group byand order by Range of materials Single customerOne monthA1Year Over Year (YOY) Top 100customers analytical report:a) Top 100 customers identified for acurrent time periodb) Business measures calculated forpreviously identified top 100customers in a given historicalperiodc) A final result set calculated thatcombines current and historicalbusiness measures for top 100customersStar-schema query with fact-to-fact join as a correlated sub-query that includes multiplejoins to dimensions, and thengroup by andorder by Range of materials; Customer countryThree variations:1. One month with YOYcomparison2. Three months with YOYcomparison3. Six months with YOYcomparisonA2YOY Trending Report for Top 100customers:a) Top 100 customers calculated for agiven time period of current yearb) Top 100 customers calculated foridentical time period of previous year(or multiple years)c) A final result set calculated thatcombines business measures for top100 customers over several yearsSeveral star-schema sub-queries each with multiple joins,group by and order by, whichare combined by a union alloperator Material group; Customer countryThree variations:1. Three months over twoyears2. Six monthsover two years3. Three months over fiveyearsTable 1 - Query Descriptions
  7. 7. 7Test ResultsThe test measured both query response time and query throughput per hour. Thequeries were run first in a single stream to measure baseline query times, then inmultiple streams of 10, 20, and 25 to measure query throughput (calculated as queriesper hour) and query response time in the context of different workloads5. The multi-stream tests randomized the sequence of query submissions across streams6andinserted 10 milliseconds of “think” time between each query to simulate a multi-user adhoc BI querying environment. Individual runtimes are listed in the Appendix.Baseline TestAs you can see from the baseline results in Figure 3, the majority of queries deliveredsub-second response times and even the most complex query, which involved theentire 5-year span of data, ran in under 4 seconds.Figure 3 - Baseline Performance, in SecondsThe Reporting and Drill-down queries (267 milliseconds to 1.041 seconds) demonstrateHANA’s excellent ability to aggregate data. For instance, the longest-running of these,R1-4, ran in just over a second (1.041) but took only 2.8X longer to crunch through 12Xmore data than its monthly equivalent (R1-1).5A single query stream represents an application connection that supports potentially hundreds ofconcurrent users.6This is done to prevent a query from being submitted by different streams at exactly the same time andto ensure a practical workload mix (e.g., drill downs happen more frequently than quarterly reports).
  8. 8. 8The Drill-down queries (276 to 483 milliseconds) demonstrate HANA’s aggressivesupport for ad hoc joins and, therefore, to provide unrestricted ability for users to “sliceand dice” without having to first involve the technical staff to provide indexes to supportit (as would be the case with a conventional database).The Analytic queries (677 milliseconds to 3.8 seconds) efficiently performedsophisticated joins (fact-to-fact, sub queries, CSQ, union all) and analysis across asliding time window (year-over-year). The queries with one- to six-month date ranges(A1-1 through A2-2) ran in 2 seconds or less. Query A2-3, which analyzed the entire 5-year date range, ran in under four seconds. All analytic query times are well withintimeframes to support iterative speed-of-thought analysis.Across the board, the baseline tests show that HANA scaled efficiently (better-than-linear) as the data volumes increased for a given query.ThroughputThe throughput tests are summarized in Table 2 and show that in the face of increasingand mixed BI workloads, HANA scales very well.TestCaseThroughput(Queries per Hour)1 stream 6,28210 streams 36,60020 streams 48,77025 streams 52,212Table 2 - Throughput TestsAt 25 streams, the average query response time was less than three seconds and isonly 2.9X higher than at baseline (see, Appendix), an indicator of HANA’s excellentinternal efficiencies and ability to manage concurrency and mixed workloads.A rough estimate of BI user concurrency can be derived by dividing total queries perhour by an estimated average number of queries per user per hour. For instance, the25-streams’ 52,212 queries per hour divided by 20 (a zesty per-user rate of one queryevery three minutes) provides a reasonable estimate of 2610 concurrent BI usersacross a mixture of reporting, drill-down and analytic query types.
  9. 9. 9Ad Hoc Historical QueriesThe performance tests were intended primarily to demonstrate HANA’s performance incustomary real-time BI query environments that focus on analyzing businessoperations. However, ad hoc historical queries that analyze the entire volume of datathat is stored may occasionally need to be run (and they should not impose a severepenalty). In addition to Query A2-3 (included in the stream tests), Queries R1-1, R4-2and R5-1 were deemed potentially relevant for a 5-year time window and wereresubmitted, this time for the entire 5-year date range. Table 3 shows that HANAdemonstrated better-than-linear scalability running against 5X more data with a 3Xincrease in response time, or less.Time Period R1-1 R4-2 R5-11 year 1.1 secs 1.4 secs 1.1 secs2 years 1.8 secs 2.4 secs 1.8 secs3 years 2.6 secs 3.4 secs 2.4 secs4 years 3.3 secs 4.4 secs 3.0 secs5 years 4.1 secs 5.4 secs 3.6 secsTable 3- Ad Hoc Historical QueriesOverall, these queries all ran in a few seconds and further highlight HANA’s ability tosupport real-time BI on massive amounts of data without tuning and without massivehardware or proprietary hardware components.Results SummaryIn these performance tests, HANA demonstrated ability to deliver real-time BI-queryperformance as workloads increase in terms of capacity (up to 100TBs of raw data),complexity (queries with complex join constructs and significant intermediate results runin less than 2 seconds), and concurrency (25-stream throughput representing about2600 active users).HANA performance is based on comprehensive, well-thought design elements,including, for example, an in-memory design smart internal data structures (e.g., native columnar, advanced compression andpowerful partitioning) a clever, cost-based optimizer that can meld these smart data structures into anefficient query plan
  10. 10. 10 efficient query execution that smartly executes the query plan to take advantageof internal components (e.g., advanced algorithms, multi-level cachingoptimization in the CPU and hyper-threading).Much has been written on HANA’s design and it is not necessary to rewrite it here. Formore information on HANA technical features, please refer to “SAP HANA TechnicalOverview” and “SAP HANA for Next-Generation Business Applications and Real-TimeAnalytics”.Real-world ExperiencesThese tests were run in a simulated environment to isolate HANA performance7;however, you can expect your own BI Query performance to be significantly better onHANA than on your existing conventional DBMS, potentially thousands of times better.Most importantly, SAP customer testimonials confirm HANA’s performance. Here are afew examples.“We have seen massive system speed improvements and increased ability to analyzethe most detailed levels of customers and products.” –Colgate Palmolive“…replacing our enterprise-wide Oracle data mart and resulting in over 20,000X speedimprovement processing our most complex freight transportation cost calculation. ...ourstand-alone mobile applications that were previously running on Oracle are now runningon HANA. Our 2000 local sales representatives can now interact with real-time datainstead, and have the ability to make on-the-fly promotion decisions to improve sales.”–Nongfu Spring“…our internal technical comparison demonstrated that SAP HANA outperformstraditional disk-based systems by a factor of 408,000...”–Mitsui Knowledge IndustryCo., Ltd.“With SAP HANA, we see a tremendous opportunity to dramatically improve ourenterprise data warehouse solutions, drastically reducing data latency and improvingspeed when we can return query results in 45 seconds from BW on HANA vs. waitingup to 20 minutes for empty results from BW on a traditional disk-based database.” –Shanghai Volkswagen7These performance tests were designed to isolate HANA system level performance independent of thevariety of applications that may be used in SAP environments. Therefore, these test results should beused only as a general guideline, and results will vary from implementation to implementation due tovariability of platforms, applications and data.
  11. 11. 11ConclusionReal-time BI query environments should be able to support new queries as soon asusers formulate them. This speed-of-thought capability is only possible if consistentlyexcellent performance is freely available in the underlying database platform.These rigorous performance tests and their results presented here confirm that, withouttuning and without massive hardware or proprietary hardware components, HANAdelivers leading performance and scale-out ability and enables real-time BI forbusinesses that must support a range of analytic workloads, massive volumes of dataand thousands of concurrent users.
  12. 12. 12Appendix – Query Runtimes*10-, 20-, 25-stream runs reflect 10ms of think time between queries.Query Name TESTCASE* Average Runtime(seconds)R1-1 1 stream 0.37430162510 streams 0.710447637520 streams 1.1128789437525 streams 1.263543115R1-2 1 stream 0.519399510 streams 0.885261012520 streams 1.285226587525 streams 1.55048987R1-3 1 stream 0.7011037510 streams 1.09892102520 streams 1.609257837525 streams 1.85689055R1-4 1 stream 1.041199510 streams 1.537781220 streams 2.2427209525 streams 2.53941444R2 1 stream 0.267100510 streams 0.474084737520 streams 0.76580112525 streams 0.94418651R3 1 stream 0.377202510 streams 0.68776247520 streams 1.0363240187525 streams 1.28864399R4-1 1 stream 0.6733857510 streams 1.1579781520 streams 1.792818862525 streams 2.081943225R4-2 1 stream 0.6114902510 streams 1.019286812520 streams 1.4834202687525 streams 1.751548475R5-1 1 stream 0.78467062510 streams 1.391405820 streams 2.01714507525 streams 2.37001971R5-2 1 stream 0.934395510 streams 1.6150219
  13. 13. 1320 streams 2.32610447525 streams 2.70342093D1 1 stream 0.425426687510 streams 0.6772760062520 streams 0.988322037525 streams 1.1591933925D2 1 stream 0.276234062510 streams 0.4998406187520 streams 0.792561487525 streams 0.9211584D3 1 stream 0.343944812510 streams 0.613760837520 streams 0.96921350937525 streams 1.1464113225D4 1 stream 0.4832167510 streams 0.8851617187520 streams 1.34551339062525 streams 1.56276527A1-1 1 stream 0.6776262510 streams 1.092555820 streams 1.7767237525 streams 2.03975081A1-2 1 stream 1.336261510 streams 2.067660320 streams 3.031998612525 streams 3.62353857A1-3 1 stream 2.060592510 streams 3.1540545520 streams 4.46925497525 streams 5.01461024A2-1 1 stream 1.546076510 streams 2.739776062520 streams 3.9931644687525 streams 4.90314213157894A2-2 1 stream 1.83877710 streams 3.25666737520 streams 4.551917812525 streams 5.32557834210526A2-3 1 stream 3.81678510 streams 6.37139420 streams 9.8878547525 streams 11.9014413333333
  14. 14. 14