Model based transaction-aware cloud resources management case study and methodology

886 views

Published on

The presentation introduces a method of cloud resources allocation to enterprise applications (EA) depending on business transaction metrics. The approach is using queuing models; it was devised while working on a real-life EA capacity planning project requested by one of the Oracle customers. An implementation of a proposed solution brought a number of database servers from 40 to 21 without compromising transaction times.
The presentation describes components of proposed methodology: building application’s queuing model, obtaining input data for modeling (workload characterization and transaction profile), solving model and analyzing what-if scenarios. The presentation compares ways and means of collecting input data; it identifies instrumentation of software at its development stage as an ultimate solution and encourages research of technologies delivering instrumented EAs.
Takeaway: model-based transaction-aware cloud resources management significantly improves cloud profitability by minimizing a number of hardware servers hosting applications while delivering required service level.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
886
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
20
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The presentation introduces a method of hardware servers allocation to enterprise applications based on business transaction metrics. The method was devised while working on real-life enterprise application capacity planning project for one of Oracle customers. Described approach significantly minimized a number of hardware servers assigned to the application. The presentation demonstrates that transaction-aware cloud management might deliver significant improvement of cloud profitability without any additional investments in hardware platform. Further research of cloud management based on business transactions metrics is worthy of consideration as it might bring significant economical benefits to the cloud providers and to the customers.
  • One of the conditions of enterprise application (EA) scalability in the cloud is its deployment on a network-like architecture where each functional service of EA is hosted on dedicated computers representing a functional cluster (typically called a server farm). Figure illustrates an architecture featuring server farms: Analytical, Consolidation, Data integration, Web processing, Business logic, Data storage, Data import/export, and Printing. Such EA deployment is capable to supports practically unlimited growth in a number of business users and complexity or volume of data by increasing a number of servers in functional clusters. The servers within each farm are functionally identical in a sense that they all provide the same “menu” of services.
  • Initiated by the users business transactions have to be distributed among farm’s servers; today a predominant approach is based on round-robin algorithm. On rare occasions a distribution rule takes into account server’s utilizations, and it directs transactions to a server with the largest capacity headroom. The mentioned above as well as the more exotic currently employed algorithms are all based on assessment of the parameters derived from hardware infrastructure monitoring. Among such parameters are server utilizations, intensity of I/O operations, size of a process working set in memory, and the like.   This presentation introduces a new method of server’s allocation that is based on business transaction metrics. Put forward approach has a remarkable impact on cloud profitability ,because it minimizes a number of servers assigned to application without compromising transaction times.
  • Analyzed in this presentation transaction distribution approach was devised while working on real-life EA sizing project for one of Oracle customers. EA had typical three tier structure with Web, Application and Database layers. Database tier included on-line analytical processing (OLAP) and relational (RDBMS) databases. A customer requested to provide an estimate of the number of hardware servers on each layer for architecture presented on the slide for anticipated workload from 400 business users. Each server had 2x4 cores CPUs and plenty of memory.
  • Workload characterization provided by customer
  • Transaction profiles obtained by monitoring system development environment
  • It can be seen from a table that as smaller transaction time for single user as higher its stretch factor for 400 users.
  • Based on Hourly service demand we broke down all transactions by two groups: a group with low (regular font) and a group with high (italic font in red) hourly service demands.
  • This study demonstrates that transaction- aware cloud management might deliver significant improvement of cloud profitability without any additional investments in hardware platform.
  • - Currently employed algorithms of workload distribution among hardware servers are either round-robin or based on assessment of the parameters derived from hardware monitoring. Among such parameters are server utilizations, intensity of I/O operations, size of a process working set in memory, and the like.   - The presentation describes a method of server’s allocation based on business transaction metrics. The method minimizes a number of servers assigned to application without compromising transaction times.   - Described approach assumes classification of transactions by groups depending on their hourly service demand and processing of each group in dedicated servers.   - Implementation of a proposed method during deployment in a cloud of a real-life EA brought a number of OLAP servers from 40 to 21without compromising transaction times. - Transaction- aware cloud management might deliver significant improvement of cloud profitability without any additional investments in hardware platform. Further research of cloud management based on business transactions metrics is worthy of consideration as it might bring significant economical benefits to the cloud providers and to the customers.
  • Queuing models represent applications by employing two constructs: transactions and nodes. One type of nodes consists of two entities: queue and processing units (pictured on a slide). Processing units serve incoming transactions ; if they all are busy transaction will wait in node’s queue. Second type on nodes does not have waiting queue but only processing units, they are abstractions of application users. We can envision a transaction initiated by a user as a physical object visiting different hardware servers. We use a symbolic metaphor for transaction - we represent it as a car traveling on highways with toll booths. A toll booth, in turn, is a metaphor for a hardware server.
  • This slide depicts a relationship between application and its queuing model. This is just one of many possible models of the same system; a model can represent a system on the different levels of abstraction. We mapped system into queuing model with three nodes – the nodes “Web server” and “A&D server” have processing units and waiting queues; a node “Users” has only processing unit. The transactions initiated by the users travel to “Web server”, after that they are processed by “A&D server” and return to a node “Users” (transactions are represented by the cars on the slide). Total transaction response time includes processing and waiting times in nodes “Web server” and “A&D server”.
  •   Below are the relationships between the components of a real system and the components of its model:   Component of application Matching component in queuing model Users and their computers Node “users” Web server Node “Web server” Application and Database server Node “A&D server” Transactions initiated by users Cars  
  • Models help to understand transaction time components and the factors they depend on. Let’s consider a business transaction that retrieves a financial report. Transaction is initiated when a user clicks on an icon labeled “Request Report”. At that moment let’s start an imaginary stopwatch to measures transaction time. Initiated transaction (we again depict it as a car, see the slide) starts its journey by moving from one node to another, waiting in queues and spending time in processing units. Finally car-transaction will get back to a user and we will stop stopwatch at that moment. A time measured by stopwatch is a transaction time - a sum of all time intervals a car-transaction has spent in waiting queues and processing units of all nodes representing system hardware. A cloud on the slide encompasses the nodes contributing to transaction time. Transaction response time is a total of waiting and processing times in nodes “network”, “Web server”, “Application server”, and “Database server”.
  • Active resources implement transaction processing and data transfer. Passive resources provide access to active resources. In order to be processed by any active resource a transaction has to request and get allocated passive resources.     If any of the assets needed for transaction processing is not available because all supply is taken by other transactions, then our transaction will wait until it is released (indeed wait time will increase transaction response time).
  • One of the most important steps in model building is a specification of model’s input data. Success or failure of modeling project to large extent is defined by input data quality. Input data contains workload characterization and transaction profiles. Workload characterization consists of three components: List of business transactions. Per each transaction a number of its executions at particular time interval, usually during one hour (that number is called transaction rate). Per each transaction a number of users requesting it.
  • Transaction profile is a set of time intervals a transaction has spent in all processing units it has visited while served by application. Obtaining transaction profile might require substantial efforts including deployment of commercial transaction monitors. In simple cases we can find transaction profile by setting up built into operating system monitoring utilities to record CPU utilization for each system server while manually executing transaction multiple times. Our experiment can be enhanced if we have an access to a software load generator that can produce a steady sequence of transaction requests from a single virtual user. System has to be exposed to a load for a period of time sufficient to collect statistically representative sets of transaction times and CPU utilizations.
  • Workload characterization and transaction profiles provide data for calculation of transaction hourly service demand and segmentation of transactions based on that parameter
  • Active resources: CPU time (data processing) I/O time (data transfer) Passive resources: Software connections to the servers and services (for example, Web server connections, database connections) Software threads Storage space Memory space   Active resources implement transaction processing and data transfer. Passive resources provide access to active resources. In order to be processed by any active resource a transaction has to request and get allocated passive resources.     If any of the assets needed for transaction processing is not available because all supply is taken by other transactions, than our transaction will wait until an asset is released (indeed wait time will increase transaction response time).
  • Model based transaction-aware cloud resources management case study and methodology

    1. 1. <Insert Picture Here>Model-based transaction-aware cloud resources management:case study and methodologyLeonid Grinshpan, Ph.D.Consulting Technical Director
    2. 2. 2DisclaimerThe views expressed in this presentation are theauthor’s own and do not reflect the views of thecompanies he has worked for neitherOracle Corporation.All brands and trademarks mentioned are the propertyof their owners.
    3. 3. 3Presentation’s goalThe presentation includes:- A case study when a number of database serverswas brought down from 40 to 21 while deploying anenterprise application for one of Oracle customers- An outline of a model-based transaction-aware cloudmanagement methodology that made possible suchminimization of hardware
    4. 4. 4Presentation’s structureSection 1SubjectSection 2Transaction-aware servers allocationSection 3How to implement model-based transaction-awaremanagement?
    5. 5. 5Section 1Subject
    6. 6. 6Deployment of enterprise application on a networkof functional clusters
    7. 7. 7Load balancing algorithms1. Round robin2. Algorithms based on assessment of hardware metrics(CPU utilization, etc)A load balancing approach discussed in thepresentation is based onbusiness transaction metrics
    8. 8. 8Definitions 1• Transaction - a request from an EA user to be processed by system.• Transaction (response) time - time to process transaction byapplication.• Transaction rate - a number of transaction requests submitted by oneuser during one hour.• Transaction service demand - time interval a transaction wasprocessed by particular component of infrastructure (network,hardware appliance, hardware server).• Transaction profile - a set of Transaction service demand s for systemresources needed to process transaction
    9. 9. 9Definitions 2• Workload - a flow of transactions generated by EA users.• Workload characterization - specification of workload that includesthree components: List of business transactions. Transaction rate. Per each transaction a number of users requesting it.• Transaction stretch factor - a parameter defined by a formula:Scalable system has the stretch factors equal to 1for all transactions
    10. 10. 10Sizing project requirements. System architecture
    11. 11. 11Sizing project requirements. Number of users andSLA• Provide an estimate of the number of hardware servers on each layer anda number of CPUs on each server for architecture presented on previousslide for anticipated workload from 400 business users.• Service level: transaction time degradations while increasing anumber of users up to 400 are acceptable if they do not exceed 7%.
    12. 12. 12Sizing project requirements. WorkloadcharacterizationTransaction nameAverage transaction timefor single user (seconds)Transaction rate Number of users executing eachtransactionOLAP maintenance small10 1 14OLAP maintenance medium10 1 2OLAP maintenance large10 4 1OLAP restructure small60 1 2OLAP restructure medium600 1 1OLAP restructure large3600 2 1OLAP update small10 1 12OLAP update medium10 1 11OLAP update large10 3 6OLAP calculation small500 2 187OLAP calculation medium2250 3 34OLAP calculation large10000 5 6Maintenance report small15 1 60Maintenance report medium25 8 9Maintenance report large200 9 1Update report small10 1 8Update report medium10 3 6Update report large10 3 1Sales report small15 3 30Sales report medium25 4 7Sales report large200 12 1
    13. 13. 13Sizing project requirements. Transaction profile(time in seconds)Transaction nameAveragetransaction time for single userTime on Web/AppserverTime on OLAPserverTime on RDBMSserverOLAP maintenance small10 1 7 2OLAP maintenance medium10 1 7 2OLAP maintenance large10 1 7 2OLAP restructure small60 1 56 3OLAP restructure medium600 1 595 4OLAP restructure large3600 1 3594 5OLAP update small10 1 7 2OLAP update medium10 1 7 2OLAP update large10 1 7 2OLAP calculation small500 1 496 3OLAP calculation medium2250 1 2245 4OLAP calculation large10000 1 9995 5Maintenance report small15 3 11 1Maintenance report medium25 6 17 2Maintenance report large200 49 147 4Update report small10 1 8 1Update report medium10 1 8 1Update report large10 1 8 1Sales report small15 3 11 1Sales report medium25 6 17 2Sales report large200 49 147 4
    14. 14. 14What-if scenarios• A few what-if scenarios with different numbers of servers on each layerwere modeled• All analyzed deployments indicated a sufficiency of one Web/Applicationand one RDBM servers• There were analyzed configurations with different numbers of OLAPserver.
    15. 15. 15Architecture with 40 OLAP server and originalworkloadThis architecture keepstransaction time deteriorationfor 400 users under 7% …… but it featured lowutilization of each OLAPserver (only 36%)It was obvious to check server utilizations for deployments with fewer OLAPservers. We did just that for 34 servers and the result contradicted SLA- anincrease of some transaction times was reaching unacceptable 18%
    16. 16. 16Transaction time degradation under original workloadTransaction1 user 400 usersStretch factor40 OLAP servers, CPUutilization of each one36%34 OLAP servers, CPUutilization of each one42%OLAP maintenance small 10.0 10.68 1.07 1.18OLAP maintenance medium 10.0 10.68 1.07 1.18OLAP maintenance large 10.0 10.68 1.07 1.18OLAP restructure small 60.0 60.68 1.01 1.03OLAP restructure medium 600.0 600.68 1.00 1.00OLAP restructure large 3600.0 3600.66 1.00 1.00OLAP update small 10.0 10.68 1.07 1.18OLAP update medium 10.0 10.68 1.07 1.18OLAP update large 10.0 10.68 1.07 1.18OLAP calculation small 500.0 500.66 1.00 1.00OLAP calculation medium 2250.0 2250.66 1.00 1.00OLAP calculation large 10001.2 10001.66 1.00 1.00Maintenance report small 15.0 15.68 1.05 1.12Maintenance report medium 25.0 25.68 1.03 1.07Maintenance report large 200.0 200.67 1.00 1.01Update report small 10.0 10.68 1.07 1.18Update report medium 10.0 10.68 1.07 1.18Update report large 10.0 10.68 1.07 1.18Sales report small 15.0 15.68 1.05 1.12Sales report medium 25.0 25.68 1.03 1.07Sales report large 200.0 200.67 1.00 1.01
    17. 17. 17What causes degradation of transaction time ?• The cause of degradation of short transactions is waiting in server’squeues until long transactions (like OLAP calculation) release a CPU.• This observation leads to a hypothesis that segmentation oftransactions based on hourly service demand by different groups andprocessing of each group in dedicated OLAP servers might minimizetotal number of OLAP servers.Transaction hourly service demand == time a single transaction spends in a server * number oftransactions per hour per user * number of transactions users
    18. 18. 18Low and high demand workloadsTransaction nameTransaction rateNumber of usersexecuting eachtransactionTime onOLAPServerHourly service demandfor OLAP server fromeach low demandtransactionHourly service demandfor OLAP server fromeach high demandtransactionOLAP maintenance small 1 14 7 98OLAP maintenance med. 1 2 7 14OLAP maintenance large 4 1 7 28OLAP restructure small 1 2 56 112OLAP restructure medium 1 1 595 595OLAP restructure large 2 1 3594 7188OLAP update small 1 12 7 84OLAP update medium 1 11 7 77OLAP update large 3 6 7 126OLAP calculation small 2 187 496 185504OLAP calculation medium 3 34 2245 228990OLAP calculation large 5 6 9995 299850Maintenance report small 1 60 11 660Maintenance report med. 8 9 17 1224Maintenance report large 9 1 147 1323Update report small 1 8 8 64Update report medium 3 6 8 144Update report large 3 1 8 24Sales report small 3 30 11 990Sales report medium 4 7 17 476Sales report large 12 1 147 1764
    19. 19. 19Architecture with segmented workloadThis configuration delivered the same transaction times and stretch factorsas a system with 40 OLAP servers handling non-segmented originalworkload.1 OLAP serverprocessing lowdemand workload20 OLAP serversprocessing highdemand workload
    20. 20. 20Take away from case study•The presentation describes a method of servers allocation based onbusiness transaction metrics. The method minimizes a number ofservers assigned to application without compromising transactiontimes•The approach assumes classification of transactions by groupsdepending on their hourly service demand and processing of eachgroup in dedicated servers•Transaction- aware cloud management might deliver significantimprovement of cloud profitability without any additional investments inhardware platform•Further research of cloud management based on businesstransactions metrics is worthy of consideration as it might bringsignificant economical benefits to the cloud providers and to thecustomers
    21. 21. 21Section 3How to implement model-basedtransaction-aware management?
    22. 22. 22What is needed?• Application model• Workload specification• Transaction profiles• Model solver
    23. 23. 23Mapping application into queuing modelHardware server representationTotal time in node = time in waiting queue + time in processing unit
    24. 24. 24Mapping application into queuing model (cont 2)
    25. 25. 25Mapping application into queuing model (cont 3)The relationships between the components of a real system and thecomponents of its modelComponent of application Matching object inqueuing modelUsers Node “Users”Web server Node “Web server”Application and DatabaseserverNode “A&D server”Requests from users Cars
    26. 26. 26Transaction response time and transaction profileTransaction time is a time spent in “cloud”
    27. 27. 27Transaction response time and transaction profile(cont 2)Active resources Passive resources•CPU time (data processing)•I/O time (data transfer)•Web server connections•Database connections•Software threads•Storage space•Memory space
    28. 28. 28Model’s input data List of business transactions Number of users per each business transaction Per each transaction a number of transactions per user per hour(transaction rate).Transaction name Number of users Transaction rateReport ABC 20 12Business Rule X 10 8Consolidation Y 5 31. Workload characterization
    29. 29. 29Model’s input data (cont 2)2. Transactions profilesTransaction profile in this example s comprised of the time intervals atransaction has spent in system servers it has visited when application wasserving only that single transactionTransaction name Service demand (seconds)Web server A&D serverReport ABC 0.5 0.5Business Rule X 0.5 2.5Consolidation Y 0.5 9.5
    30. 30. 30Workload segmentationTransaction hourly service demand == time a single transaction spends in a server * number oftransactions per hour per user * number of transactions usersTransaction-aware management is based on classification oftransactions by groups depending on their hourly service demandand processing of each group in dedicated servers
    31. 31. 31How to obtain workload characterization andtransaction profiles?•Analysis of business process – creating process flowcharts basedon interviews of key process participantshttp://www.wikihow.com/Analyze-a-Business-Process•Business transaction management software - tracking transactionacross application http://en.wikipedia.org/wiki/Business_transaction_management•Application instrumentation on software development stage –making application manageablehttp://en.wikipedia.org/wiki/Instrumentation_(computer_programming)•Big data analysis – forensic analysis of transactional data collectedover timehttp://en.wikipedia.org/wiki/Big_data
    32. 32. 32Application instrumentation is the most potenttechnology• Transactions are defined at application development stage• It is possible to assign unique ID to each transaction• Unique transaction ID enables tracking transaction path amongservers•Unique transaction ID enables measurement of each server passiveand active resources allocated to transaction•Unique transaction ID enables logging in file information on allexecuted transactions with their parameters
    33. 33. 33Parameters of a transaction saved in log file• Unique transaction ID• ID of a user who initiated transaction• Transaction start and stop date and times• Transaction total execution time• Per each server :time transaction entered servertime transaction exited servertime transaction spent in server
    34. 34. 34Transaction profile data generated by instrumentedapplicationQty of passive resource 1Qty of passive resource 2…Qty of passive resource M(transaction ID)(server 1) (server 2) … (server N)CPU timeI/O time
    35. 35. 35Application instrumentation technologies• Application Response Measurement – ARMhttps://collaboration.opengroup.org/tech/management/arm• Apache Commons Monitoringhttp://commons.apache.org/sandbox/monitoring/instrumentation.html• Tracing and Instrumenting Applications in Visual Basic and Visual C#http://msdn.microsoft.com/en-us/library/aa984115(v=vs.71).aspx• Systemtap for Linuxhttp://sourceware.org/systemtap/tutorial.pdf• Java Management Extension (JMX)http://docs.oracle.com/javase/tutorial/jmx/index.html
    36. 36. 36Application instrumentation is must-have component ofefficient cloud managementtransaction profiles transaction log filesInstrumented application provides transaction profilesand transactional log filesBig data analysis - extracting workloadcharacterization from transactional log filesApplication queuing models generate estimates ofsystem performance for different what-if scenariosCloud management implements the best scenario
    37. 37. 37Research areas• Enterprise application instrumentation as a provider of transactionaldata• Big data analysis delivering transactional workload characterizationsand workload variability patterns for proactive cloud management• Queuing models of enterprise applications enabling analysis of differentwhat-if scenarios
    38. 38. 38To learn more about queuing models ofenterprise applications checkauthor’s book“Solving Enterprise Applications PerformancePuzzles: Queuing Models to the Rescue”(available in bookstores and from Webbooksellers)http://www.amazon.com/Solving-Enterprise-Applications-Performance-Puzzles/dp/1118061578/ref=sr_1_1?ie=UTF8&qid=1326134402&sr=8-1https://www.amazon.com/author/leonid.grinshpanContact Leonid Grinshpan at:leonid.grinshpan@oracle.com
    39. 39. 39

    ×