• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Lecture about SAP HANA and Enterprise Comupting at University of Halle
 

Lecture about SAP HANA and Enterprise Comupting at University of Halle

on

  • 815 views

Slides of my talk at computer science department of university of Halle.

Slides of my talk at computer science department of university of Halle.

Statistics

Views

Total Views
815
Views on SlideShare
807
Embed Views
8

Actions

Likes
2
Downloads
0
Comments
0

1 Embed 8

https://twitter.com 8

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Lecture about SAP HANA and Enterprise Comupting at University of Halle Lecture about SAP HANA and Enterprise Comupting at University of Halle Presentation Transcript

    • In-Memory Data Management and Challengesfor Enterprise Computing & ResearchTobias TrappAOK Systems GmbH
    • 2© AOK Systems GmbH 2013ContentHANA Architecture and Use CasesEnabling Quantitative ApproachesSummaryChances for Development & IT Management
    • 3© AOK Systems GmbH 2013SAP HANA - former „High Performance Analytic Appliance“HANA is a hardware device from certified vendors with integrated firmware has standard DBMS features: ACID properties, high availability, SQL and MDX. Itis fully MVCC with regular capabilities like statement level and snapshot isolation has specialized engines (calculation and planning engine) and proprietarylanguages: SQL Script, RDL, … supports pushing calculations down to the database level by IMSL, R andspecialized libraries for Data Mining, Machine Learning, Statistics, Optimizationand financial mathematics SAP is working on multi-tenancy - so far only certain scenarios are supported forcustomers supports text analysis, indexing and search – support of geospatial data wasannounced support of temporal tables
    • 4© AOK Systems GmbH 2013HANA Hardware© Hitachi Different hardware vendors offer appliances: Cisco, Dell, Fujitsu, Hitachi, HP,IBM, NEC – the solutions differ in details SAP HANA is running on Intels Westmere-EX / E7 processors; Intel and SAPcollaborated to optimize HANA for those CPUs A single HANA node has 128GB * number of CPUs of RAM, a CPU has 20 cores HANA uses Fusion-io flash drives as log space that have the same size as RAM(logs are written after each transaction) In addition HANA has disk storage for persistencywhich is about 40 times of RAM size HANA scales out and can be installed on multiplenodes. So far scale out scenarios have been used SAPinternally but they have been certified for public userecently - experts expect nearly linear scaling
    • 5© AOK Systems GmbH 2013HANA as a multi-core Platform HANA is an In-Memory database optimized for multicore technology: as much aspossible is kept in CPU and caches - usage of storage hierarchy for persistence Ailamaki et al. showed that RDBMS doesn‘t work optimally on multicoreprocessores and has up to 50% idle times (see DBMSs on a modern processor:Where does time go?, Proceedings on the 25th International Conference on VeryLarge Databases (VLDB), 1999) Since in the last years CPU got faster by using multi-cores and not increasingclock rate, SAP decided to create a platform that is optimized for parallelexecution Up to SAP’s information HANA scales up linearly in size of HANA RAM
    • 6© AOK Systems GmbH 2013Advantages of Column Stores In a column store data is stored using special encodings that save the value andthe number of occurings in a row (see Plattner, A Common Database Approachfor OLTP and OLAP Using an In-Memory Column Database, SIGMOD 2009) This leads to new possibilities and drastic performance gains:- data can be loaded very fast into CPU- column operations esp. aggregates can by performed very efficiently- additional indexes (especially materialized) can be eliminated- operations on multiple columns can be parallized on multiple cores Further optimization possible using an insert-only approach to avoid expensiveupdate operations (see Copeland and Khoshafian, A Decomposition StorageModel, Proceesings of the 1985 ACM SIGMOD International Conference onManagement of Data, Austin, Texas, p. 268-279, ACM Press)
    • 7© AOK Systems GmbH 2013Advantages of Column Stores in an ERP Environment Krueger et al. (see Krueger et al., Enterprisedata management in mixed workloadenvironments, 16th International Conferenceon Industrial Engineering and EngineeringManagement, 2009) showed that in typicalERP systems most of the columns containonly a few disctinct values. The figure showsfirst 10 out of 98 columns of an accountingheader table in descending order Most SQL queries work on only 10% of the rows (see Plattner, A CommonDatabase Approach for OLTP and OLAP Using an In-Memory Column Database,SIGMOD 2009) which makes data access in column stores fast
    • 8© AOK Systems GmbH 2013HANA Architecture in a Nutshell technical foundations prototyped in SanssouciDB at HPI. SAP integrated TREXsearch engine, P*Time and MaxDB for persistence Planning Engine for execution ofbasic financial planning operations Calculation Engine as common infrastruc-ture that can be accessed using SQL Script Extended Application Services(server-side JavaScript forlight-weight applications) make itpossible to expose data and queriesusing REST interfaces Programming on database level usingL (a restricted subset of C++), and C++(so far not released for customers)© SAP
    • 9© AOK Systems GmbH 2013Calculation Engine as Common Execution Runtime An overview of the HANA architecture is given in Franz Faerber et al. In The SAPHANA Database – An Architecture Overview, IEEE Data Engineering Bulletin,Volume 35 The Calculation Engine is a common execution runtime that is able to optimizeand execute calculation models (see Bernhard Jaecksch, Franz Faerber, WolfgangLehner: Cherry picking in database languages. IDEAS 2010: 117-122. 2007) fromvarious domain specific languages This approach is very flexible and extensible because the calculation model is adata flow graph whose nodes can contain operations from various operatorsthat can integrate different frameworks accessible from the executionenvironment – the column store as well as specialized DSLs So the Calculation Engine introduces the first level of parallelization, the calledoperators (especially column store) can introduce further parallelizationaccessing a single row with different processes as well splitting them intomultiple partitions in distributed scenarios (scale-out)
    • 10© AOK Systems GmbH 2013Excursus: HANA and Exalytics have different Architecture &Technology HANA keeps the data in Dynamic Random Access Memory, Solid-State-Disks/Flash devices are used for persistence. A 0.5 TB appliance usually has 2 TBSSD storage for persistence. In contrast Oracle‘s Exadata appliance keeps mostdata in SSD/Flash. HANA is an integrated solution for OLTP, BI and predicitive analysis. Exalyticsconsists of different components. When using Oracle Exalytics data arereplicated for read-only scenarios: into TimesTen database for reporting and intoEssbase OLAP Engine for forecasting. HANA uses columnar storage, special encodings and RAM- and processor cache-aware algorithms which is quite similar to Sybase IQ or hBase database. Exalytics(exactly its TimesTen engine) does provide so called hybrid columnarcompression scheme. There are scale-out scenarios for HANA (at the moment only SAP internal but notyet released for SAP Business Suite) but Exalytics has not.
    • 11© AOK Systems GmbH 2013Current HANA Research: Graph Data Structures and Processingin the Data Management Platform Current Research and possible applications are described in Rudolf et. al, TheGraph Story of the SAP HANA Database, at 15. GI-FachtagungDatenbanksysteme für Business, Technologie und Web, 11. März - 15. März 2013 A software layer called Active Information Store was created on top of thecolumn store which allows is a generalization of directed multigraphs withattributes on vertices and edges as well as hierarchies of attributes (taxonomies) For graph manipulation, query, graph traversal and BI-like aggregation the WIPElanguage („Weakly structured Information Processing and Exploration“) wasinvented, see Bornhövd et al., Flexible Information Management, Explorationand Amalysis in SAP HANA, Proceedings of the International Conference on DataTechnologies and Applications, pages 15-28, SciTePress, 2012
    • 12© AOK Systems GmbH 2013Current SAP HANA Use Cases SAP Business Warehouse on HANA SAP Business Suite on HANA Accelerators and Rapid Deployment Solutions:- Customer Segmentation- Financial and Controlling- Operational Intelligence- Sales Pipeline Analysis- Smart Meter Analysis (Utilities)Also for non-SAP data: personalized cancer therapy real time offer management for online gamesby Big Point real time analysis & simulation for Formula Oneby McLaren
    • 13© AOK Systems GmbH 2013AOK – Business at Large ScaleAOK has market share 34% - we have to work on mass data: 24 millions of insurants 54.500 employees 370 million medical treatments by resident physicians per year 6 million hospital treatments per year 400 million prescription of medicaments per yearOur mission: optimal service for insured people continuous improvement of quality of teatment and prevention optimal allocation of costs
    • 14© AOK Systems GmbH 2013HANA @ AOK – Operations at Large ScaleWe have to automize all business processes and create complex workflows only forthe relevant items. But what is relevant? We need insight for making decisions: prediction based on operational data: how much will a treatment cost? selection of insured people for Disease Management Programs and campaigns simulation: „What happens if we change fraud detection rulesets?“ fast navigation in huge data sets of structured and non-structured data measuring market campaign response anomaly detection cross selling, up and down selling of insurance products finding hidden patterns in data
    • 15© AOK Systems GmbH 2013HANA @ AOK – Analytical Applications Results from HANA queries: „Diabetic foot syndrome“ is a prediction of possibleamputation within next 3 months that is used to identifycandidates for disease management programs. Wesimplified the query to 250 lines of SQL. BW processing time could be reduced by 60% and by 80%after a redesign which also improved runtime on traditionalDB. We have the same code line for HANA andnon-HANA BW Most BW queries became 20 times faster
    • 16© AOK Systems GmbH 2013ContentHANA Architecture and Use CasesEnabling Quantitative ApproachesSummaryChances for Development & IT Management
    • 17© AOK Systems GmbH 2013In-Memory Computing simplifies Queries and Data Models SQL-statements become easier using set theoretic SQL start to do operational reporting directly on OLTP systems with traditional databases often we have to persist results of calculations likeaggregations, using HANA this is only necessary if calculations are complex andcontain values from external systems or have to be persisted because ofcompliance performing more and more aggregations on the fly leads to simplification of thecode and aggregated values are up to date Business Warehouses processing gets faster and simpler if we remove complexstaging & materializations faster response provides more insight into data and reduces development cycles
    • 18© AOK Systems GmbH 2013Challenge #1 – Evolution of existing Business ApplicationsNew applications based on HANA can be developed in various programminglanguages. SAP Business Suite and SAP Business Warehouse are databaseagnostic and can benefit directly from HANA. Furthermore: SAP performs optimizations of programs and frameworks to make them use ofHANA proprietary features the ABAP language and infrastructure is evolved to support HANA specificfeaturesThat implies challenges for software engineering: new development patterns for code pushdown beyond stored procedures new programming models for efficient transactional applications evolution of existing applications to run more efficiently using HANA
    • 19© AOK Systems GmbH 2013Challenge #2 – Topics for Research in In-Memory Analytics Real-time Data Warehousing is complex: HANA knows the concepts oftemporal tables but BW processing consists of complex processing stepswhich makes temporal queries non-trivial OLTP reporting is more difficult compared to OLAP reporting:- an Enterprise Data Warehouse has a governance of the data model- there are no deletions, data is preprocessed to ensure consistence- cleansing process of data, enrichement and completion Advances in OLTP reporting will lead to convergence of OLAP and OLTP. Moreand more analytics will performed directly on operational data
    • 20© AOK Systems GmbH 2013Why is this an Inflection Point for IT ArchitectureManagement? Today‘s IT system landscape are „best of breed“, heterogeneos and diverse They consist of- standard software for operations- individual software- highly specialized software f.e. for statistical and optimization- platforms for edge-innovation- OLAP systems with complex ETL processes HANA can be used as data storage but also as development platform for allabove mentioned systems (SAP and non-SAP) Architects of enterprise IT can use it to identify complexity and latency in ITlandscapes use it for simplification
    • 21© AOK Systems GmbH 2013Complexity in IT Landscapes Enterprise Architecture separated OLAP and OLTP. This produces latency andcomplexity because of ETL szenarios The same pattern is applied in other cases:- often data in mainframe systems is replicated from VSAM data files/IMSinto an RDBMS to give client-server applications or other systems (OMS f.e.)access to those data- even data from operational SAP system are often replicated to avoid directaccess from external applicationsRemark:1. From my point of view service orientation couldn‘t solve this problem. Studies(see D. Krafzig et al., Enterprise SOA, Prentice Hall PTR, Eaglewood Cliffs, 2006)say, that the overall reuse factor of a service is 1.62. I don‘t know much scientific work about metrics of IT Landscapes & reasonsfor latency – this could be topic of thorough research
    • 22© AOK Systems GmbH 2013Complexity and Latency in Enterprise Resource Planningdefinition ofbusiness rulesimplementationand test ofbusiness rulesworking withbusinessrulesdataextractiondataprocessingand analysis Today‘s ERP systems aren‘t agile enough:every step of this process on the right cantake weeks How to speed the whole process up?- operational reporting: analyzing hugeamount of operational data, evenreal time data- getting faster insight into data byperforming queries in real time insteadof hours- simulation of changes of business rulesin transactional systems
    • 23 latency produces workarounds thatincrease complexity of IT landscape platform for edge-innovation increasecomplexity, too, if they require new dataflow and data integration virtualization & enterprise services busesprovide help, nevertheless IT governanceand releases planning are complex tasks:data flow is complex, changes take time if a solution or a change is delivered toolate business users will createworkarounds that increase complexityescpecially if data is written back fromworkaround systems into operationalsystems© AOK Systems GmbH 2013Workarounds and Edge-Innovation increase ComplexityCRM ERPcentralCRMcentralERPHCMHCMBWBWSRMnonSAPnonSAPPortalworkaroundworkaroundspecializedsystemworkaroundIT systems and data flows
    • 24© AOK Systems GmbH 2013ContentHANA Architecture and Use CasesEnabling Quantitative ApproachesSummaryChances for Development & IT Management
    • 25© AOK Systems GmbH 2013In-Memory Computing and Decision MakingWith In-memory technology you can help users of IT systems: users benefit from Google-like search functions navigation in huge datasets access all data for a customer faster segmentation for campaigns in customer relationship management Business Intelligence and Data Mining on operational data simulation of changes of business rules based on operational data performing predictions solving optimization problemsHANA is an enabler for quantitative methods in the area of operation: decisionmaking and optimization
    • 26© AOK Systems GmbH 2013Challenge #3 – Quantitative Methods for Business Insightare used only in a few lines of BusinessesThe biggest strenght of HANA is not speed. It is a calculating engine providingbusiness insight and is an enabler for decision making. This requires more skillsfrom Statistics, Data Mining and Machine Learning. But: only a few lines of businesses frequently use mathematical methods: finance,insurance, logistics (supply chain management) developers need skills in Business Intelligence and Business Warehousefoundations: key figures, measures, star schemas, hierarchies and otherconcepts directly supported by HANA using attribute and calculation views thatoperate on top of Calculation Engine isolated skills aren‘t enough – we need skills of a „Data Scientist“ in companiesthat work with „Big Data“ (Facebook, Google, Amazon) methods from Operations Research are even more seldom used than otherquantitative approaches
    • 27© AOK Systems GmbH 2013Does Data Speak for Itself?Taken from „What Data Doesn‘t Do“ by Coco Krumme in „Beautiful Data“
    • 28© AOK Systems GmbH 2013Can Simple Statistics Help?Taken from „What Data Doesn‘t Do“ by Coco Krumme in „Beautiful Data“
    • 29© AOK Systems GmbH 2013Challenge #4 – Skill Management in the Enterprise To use the full potential of HANA we need mathematical skills (visualization ofhuge data sets, predicitve analytics and simulation) – unfortunately thoseskills are rare Developers need skills with mathematical standard software (R, IMSL) BI experts don‘t know OLTP data models - programmers usually have limitedBI skills Many BW experts are afraid of using virtual data sources and prefermaterialized aggregations instead BI experts and experts from operations usually don‘t work in the sameorganizational units
    • 30© AOK Systems GmbH 2013Challenge #5 – Innovation Management in Enterprises We accepted limitations of traditional database systems since years and have„scissors in mind“ Because IT people tend to think like engineers in solutions SAP established themethod of „Design Thinking“ – here a definition from Wikipedia:„As a style of thinking, design thinking is generally considered the ability tocombine empathy for the context of a problem, creativity in the generation ofinsights and solutions, and rationality to analyze and fit solutions to thecontext.”
    • 31© AOK Systems GmbH 2013ContentHANA Architecture and Use CasesTransformation of Enterprise ITEnabling Quantitative ApproachesSummaryNew Development Patterns
    • 32© AOK Systems GmbH 2013My Personal Conclusion With HANA we can build new types of business applications HANA makes existing SAP and non-SAP solutions faster and more flexible whichleads to more agility HANA is the first step towards convergence of OLAP and OLTP Enterprise Architects can use HANA to simplify corporate IT landscapes Software developers have the chance to use more quantitative approaches inbusiness and bring it near to operations Therefore we need new skills in the enterprise: classical BI, statistics, datamining, traditional data warehousing, machine learning, optimization andbusiness domain
    • 33 OLTP reporting: where to perform data cleansing, enrichment and completion?how to achieve consistent time-awareness? software engineering: programming models that allow code pushdown ofbusiness logic to the database software evolution: how to evolve systems and IT-Landscape to profit from In-Memory Technology? how can we push down code to the database and stillkeep maintainability and one codeline? solving large scale optimization problems on HANA: strengths and weaknesses ofthe current architecture & libraries advanced business rules on the database: monotone and non-monotonereasoning.© AOK Systems GmbH 2013Some HANA relevant Research Topics
    • 34 Graph Based Search and Graph Based Data Minining: so far SemanticTechnologies provided solutions but didn‘t scale Combination of Graph Based Data Mining and traditional Data Mining Complex Event Processing and SOA integration: With HANA we can store eventstreams (RFID events from manufacturing, clicks in webshops etc.) – how can wedefine alerts and notifications from those data and publish them in a SOA? Multi Criteria Decision Making (see Kou, Miettinen and Shin, „Multiple CriteriaDecision Making: Challenges and advancements“, Journal of Multi-CriteriaDecicion Analysis, vol. 18, 2001)© AOK Systems GmbH 2013Research Projects where HANA is promising
    • 35 code pushdown of very complex rulesets, f.e.- checks according provisions regulating benefits of the German Social Code- automated agent determination for worflows expert systems for advanced process automation:- accident questionnaires contain narrative text that has to be evaluatedusing business rules that also need data from the backend- automated fine tuning of those rulesets© AOK Systems GmbH 2013Some Challenges at AOK
    • 36© AOK Systems GmbH 2013Challenge #6 – Invention vs. AdoptionI presented examples for research topics that could be tackled using HANA as aplatform. Last but not least a personal advice: academia created innovativetechnology but why aren‘t they ubiquitous in industry?This is an acid test for prototypes: Do they work with real data? Are they able to work with huge data sets? Can business people use them? Are they so easy to use like a mobile app? If parts of the domain changes (business rules,compliance…), can you adapt the application withinshort time?© DB AG
    • 37© AOK Systems GmbH 2013Thank you foryour attention!
    • 38© AOK Systems GmbH 2013Information about SAP‘s In-Memory Data Management General Information:- www.experiencehana.com- www.scn.sap.com- help.sap.com/hana Training material- open.sap.com- www.saphana.com/community/implement/hana-academy- openhpi.de/course/inmemorydatabasesStarting point for search for scientific HANA research:www.informatik.uni-trier.de/~ley/pers/hd/f/F=auml=rber:Franz.html
    • 39© AOK Systems GmbH 2013SAP University Alliance Informationen unter scn.sap.com/community/uac SAP HANA @ Universities: scn.sap.com/community/uac/hana SAP gives access to:- 30 Tage Free Trial LicenseHANA in the Cloud- training material- special prices for HANA access- SAP HANA DemoCloud environment forUniversities