EOUG95 - Client Server Very Large Databases - PaperDocument Transcript
David Walker Page 1 BUILDING LARGE SCALEABLE CLIENT/SERVER SOLUTIONS David Walker European Oracle User Group, Firenze, Italia, 1995SummaryThe replacement of Legacy systems by Right-Sized solutions has lead to a growth inthe number of large open client/server installations. In order to achieve the requiredreturn on investment these solutions must be at the same time flexible, resilient andscaleable. This paper sets out to describe the building of an open client/server solutionby breaking it up into components. Each component is examined for scaleability andresilience. The complete solution, however, is more than the sum of the parts andtherefore a view of the infrastructure required around each element will be discussed.IntroductionThis paper will explore the proposition that large scale client server solutions can bedesigned and implemented by de-constructing the architecture into quantifyablecomponents.This can be expressed as a model based on the following four server types: Database Server Application Server Batch Server Print ServerEach of these components must have two characteristics. The first characteristic isresilience. Having a single point of failure will mean that the whole solution may fail.This must be designed out wherever possible. The second characteristic is scaleablity.All computers have limitations in processing power, memory and disk capacity. Thesolution must take into account hardware limitations and be designed to makeoptimum use of the underlying platform.It is assumed that the system described will be mission critical, with over two hundredconcurrent users and a database not less than ten gigabytes. The growth of these sites tothousands of concurrent users and hundreds of gigabytes of data is made easier bydesigning the system to be scaleable.
David Walker Page 2
David Walker Page 3Database ServerThe Database Server is the component traditionally thought of as The Server. It is herethat the Oracle RDBMS deals with requests, normally via SQL*Net. A resilientDatabase Server will be spread over more than one node of a clustered system usingOracle Parallel Server [OPS] in case of hardware failure within a single node.A clustered system is one where homogeneous hardware nodes are linked together.They share a common pool of disk, but have discrete processor and memory. There is aLock Manager [LM] which manages the locking of data and control objects on theshared disks. The Lock Manager operates over a Low Latency Inter-connect [LLI](typically Ethernet or FDDI) to pass messages between nodes. The disks are connectedto all nodes through a High Bandwidth Inter-connect [HBI] (typically SCSI).Clustering a system provides additional resilience, processing power and memorycapacity. However, since the complete database must be available on shared disk, themaximum database size can be no greater than the maximum database size configurableon a single node.The disks within such mission critical systems are also required to meet the resilienceand scaleability criteria. The Redundant Array of Inexpensive Disks [RAID]technologies provide a solution. RAID level 0 provides striping across many disks forhigh performance, but it is fault intolerant. RAID level 1 implements disk mirroring,where each disk has an identical copy. RAID level 5 offers striping with parity. WhilstRAID level 5 is cheaper than full mirroring there are performance issues when a failureoccurs. In the type of system that is being examined in this paper the solution is to useRAID 0 and RAID 1; striping and mirroring.The basic level of resilience is to start an OPS instance on each of two nodes. All usersare then connected to one node, the other node participates in the OPS database but noconnections are made to the database. When the active node fails, all users migrate tothe alternate node and continue processing. The precise method of migration willdepend on the network protocol in use. Any existing network connections will have tobe established to the alternate node and the current transaction will be lost. Theconnection will require users to be directed to the alternate node. This would typicallybe dealt with transparently from the Application Server (See below).As the application grows the load may exceed the processing power, or memorycapacity of a single node. At this point the users can be distributed over multiple nodes.In order to do this the application must have been designed with OPS in mind. Thisrequires that, where possible, the applications are partitioned.
David Walker Page 4A large airline system, for example, may have applications for Reservations and forFlight Operations. Select statements will involve blocks being read from commonlookup tables. Inserts and updates will take place on tables mainly within the set oftables belonging to one or other application. Partitioning these tables reduces the locktraffic being passed between each node and consequently improves the overallperformance of the OPS system. Where it is impossible to partition an application moresophisticated techniques are required. A detailed discussion of these techniques isbeyond the scope of this paper.It is also important to make some decisions about the use of stored procedures. Theseare a powerful tool that can reduce coding in front-end applications. They can alsoensure code integrity as changes to standard packages need only be modified in oneplace. They do, however, force the load back into the Database Server. This means thatscaleability is limited by the processing capacity of the Database Server. This, as it willbe shown later, may not be as scaleable as putting the work back into the ApplicationServer (See Below).There exists a number of concerns about buying redundant equipment. For two nodesystems the second node could, for example, be utilised for a Decision Support System[DSS] database. This normally fits well with the business requirements. A DSS isnormally built from the transactional data of the On-line Transaction Processing[OLTP] system. This will transfer the data via the system bus rather than over a networkconnection. If the node running the OLTP system fails, DSS users can be disabled andthe OLTP load transferred to the DSS node.Where there are two or more OLTP nodes a strategic decision can be made to limitaccess to the most important users up to the capacity of the remaining node(s). This isinevitably difficult to decide and even harder to manage. When one node has failedpressure is on to return the system to service. Having to make decisions about therelative importance of users may be impossible.The OPS option is available with Oracle 7.0. With Oracle 7.1 the ability to use theParallel Query Option [PQO] became available. This allows a query to be broken upinto multiple streams and run in parallel on one node. This is not normally useful in thedaytime operations of an OLTP system, as it is designed to make use of full table scansthat return a large volume of data. Batch jobs and the DSS database may however takeadvantage of PQO for fast queries. In Oracle 7.2 the ability to perform PQO overmultiple nodes becomes available. This is most useful to perform the rapid extract ofdata in parallel from the OLTP system running across N minus one nodes whilstinserting into the DSS node.
David Walker Page 5Backup of the system is also critical. A system of this size will most probably use theOracle on-line or hot backup ability of the database. Backup and recovery times aredirectly related to the number of tape devices, the size of each device, the read/writespeed of the devices and the I/O scaleability of the system and the backup software. Ifthe system provides backup tape library management then it is desirable to use a largernumber of short tape devices with high read/write speeds. Tapes such as 3480 or 3490compatible cartridges are therefore desirable over higher capacity but less durable andslower devices such as DAT or ExabyteRecovery will require one or more datafiles to be read from tape. These can be quicklyreferenced by a backup tape library manager that prompts for the correct tape to beinserted. Large tapes may contain more than one of the required datafiles and may havethe required datafile at the end of a long tape. This does not affect backup time but canhave a considerable affect on the time taken to recover. Since all nodes are unavailable aquick recovery time is important.Further enhancements to the speed of backup will be made possible by using OracleParallel Backup and Restore Utility [OPBRU]. This will integrate with vendor specificproducts to provide high performance, scaleable, automated backups and restores viarobust mainframe class media management tools. OPBRU became available withOracle 7.0.The optimal solution for the Database Server is based on an N node cluster. Therewould be N minus one nodes providing the OLTP service and one node providing theDSS service. The DSS functionality is replaced by OLTP when a system fails. Thenumber and type of backup devices should relate to the maximum time allowed forrecovery.Application ServerThe Application Server configuration can be broken down into two types. The first typeis called host based, the second is client based. These two techniques can either beused together in a single solution, or one can be used exclusively.A host based Application Server is where the user connects to a host via telnet or rloginand starts up a session in either SQL*Forms or SQL*Menu. This typically usesSQL*Forms v3, character based SQL*Forms v4, or the MOTIF based GUI front-end toSQL*Forms v4. This type of application is highly scaleable and makes efficient use ofmachine resources. The terminal is a low resource device of the Thin-client type. Theamount of memory required per user can be calculated as the sum of the size ofconcurrently open forms. The CPU load can be measured by benchmarking on a perSQL*Forms session basis and scaled accordingly. Disk requirement equals the size ofthe code set for the application, plus the size of the code set for Oracle.
David Walker Page 6Each host based Application Server can potentially support many hundreds of users,dependent on platform capability, and the growth pattern is easily predicted. When ahost based Application Server is full, a second identical machine can be purchased. Thecode set is then replicated to that machine and users connected. Resilience is achievedby having N+1 host based Application Servers, where N is the number of machinesrequired to support the load. The load is then distributed over all the machines. Eachmachine is then 100*N/(N+1)% busy. In the case of a host based Application Serverfailure 100/(N+1)% of the users are redistributed causing minimal impact to the users asa whole. It should be noted that these machines are not clustered; they only holdapplication code that is static.Since the machines are not clustered it is possible to continue to add machinesindefinitely (subject to network capacity). This relates back to the issue of storedprocedures mentioned above. The Database Server is limited by the maximumprocessing power of the cluster. The host based Application Server can always add anadditional node to add power. This means that the work should take place on the hostbased Application Server rather than the Database Server. The more scaleable solutionis therefore to put validation into the SQL*Forms, rather than into stored procedures. Inpractice, it is likely that a combination of stored procedures and client based procedureswill provide the best solution.Host based Application Servers also make it easy to manage transition between nodes inthe Database Server when a node in the Database Server fails. A number of methodscould be employed to connect the client session to the appropriate server node. A simpleexample is that of a script monitoring the primary and secondary nodes. If the primaryDatabase Server node is available then its host connection string is written to aconfiguration file. If it is not available then the connection string for a secondaryDatabase Server node is written to the configuration file. When users reconnect theyread the configuration file and connect to the correct server.Client based Application Servers are used in conjunction with the more traditional clientside machines. Here a Personal Computer [PC] runs the SQL*Forms which today islikely to be Windows based SQL*Forms v4. This is the so called Fat Client. In a largeenvironment the management of the distribution of application code to maybe severalthousand PCs is a system administrators nightmare. Each PC would more than likelyneed to be a powerfully configured machine requiring a 486 or better processor, with 16Mb of memory and perhaps 0.5 Gb of disk to hold the application. Best use of thepower and capacity available is not made as it is an exclusive, rather than shared,resource.Microsofts Systems Management Server [SMS] has recently become available. Thismay help with the management of many networked PCs. SMS provides four mainfunctions: an inventory of hardware and software across the network; the managementof networked application; the automated installation and execution of software onworkstations; the remote diagnostics and control of workstations. Whilst in production,this software is yet to be tested on large scale sites.
David Walker Page 7The client based Application Server can help by serving the source code to each PC,typically by acting as a network file server. The SQL*Forms code is then downloadedto the PC at run-time. This overcomes the problem of disk capacity on the client PC andhelps in the distribution of application code sets. It does create the problem ofpotentially heavy local network traffic between the PC and the client based ApplicationServer as the SQL*Forms are downloaded. It also does not help the requirement formemory and processor power on the PC. The client based Application Server can alsobe used to hold the configuration file in the same manner as the host based ApplicationServer. This again deals with the problem of Database Server node failure.The growing requirements for Graphical User Interfaces [GUI] may be driven by a mis-guided focus of those in control of application specification. Windows based front-endsystems are seen as very attractive by those managers who are considering them. Thebasic operator using the system however is usually keyboard, rather than mouse,orientated. Users will typically enter a code and only go to a lookup screen when anuncommon situation arises.An example where a GUI is inappropriate is of a Bureau de Change whose old characterbased systems accepted UKS for UK Sterling and USD for US Dollars. In all, enteringthe code required eight keystrokes and took about five seconds. The new systemrequires the use of a drop down menu. A new mousing skill was needed by the operatorto open the window. More time was spent scrolling to the bottom of the window. Adouble click was then used to select the item. This was repeated twice. After a month ofuse, the best time achieved by the operator was about thirty seconds. This is six timeslonger than with the original character based system and any delay is immediatelyobvious as the operation is carried out in front of the customer.A large scaleable client/server system can take advantage of both methodologies. Thegeneral operator in a field office can make a connection to a host based ApplicationServer and use the traditional character based system. This is a low cost solution as thehigh power PCs are not required and the resources are shared. The smaller number ofmanagers and knowledge workers who are based at the head office or in regionaloffices can then take advantage of local client based Application Servers and GUIs.Although these users require high specification PCs the ratio of knowledge workers tooperators makes the investment affordable.In order to make the task of building application code simpler, it is advantageous tocode for a standard generic batch and print service. Since the platform each componentis running on may be different (for example: a UNIX Database Server, a Windows/NTclient based Application Server, etc.), SQL*Forms user exits should be coded to onestandard for batch and printing. A platform specific program should then beimplemented on each server to work with a local or remote batch queue manager.
David Walker Page 8Batch ServerBatch work varies from site to site, and is often constrained by business requirements. ABatch Server will manage all jobs that can be queued. The Batch Server may also act asthe Print Server (See below). The batch queues have to be resilient and when one nodefails another node must be capable of picking up the job queues and continuing theprocessing. Some ways of achieving this resilience are suggested below.The first is to implement the queue in the database. The database is accessible from allnodes and the resilience has been achieved. The alternate node therefore need onlycheck that the primary node is still running the batch manager, and if not start up a localcopy. The second method is for the Batch Server to be clustered with another machine(perhaps the Print Server), and a cluster-aware queue manager product employed. Thequeues set up under such a product can be specified as load-balancing across clusterednodes, or queuing to one node, with automatic job fail-over and re-start to an alternatenode on failure. An alternative clustered implementation would be to maintain thequeue on shared disk. When the Batch Server node fails the relevant file systems arereleased and acquired by the alternate node, which examines the queue and restarts fromthe last incomplete job.Two other requirements exist. The first is the submission of jobs; the second is theprocessing of jobs. The submission of jobs requires that the submitting process sends amessage to the batch manager or, in the case of a queue held in the database, inserts arow into the queue table. This can be done via the platform specific code discussedabove. If the queue is held in the database the workload is increased on the DatabaseServer as the batch queue is loaded, however it is also easy to manage from within theenvironment.Job processing depends on the location of the resource to perform the task. The task canbe performed on the Batch Server, where it can have exclusive access to the machinesresources. This requires that the code to perform the task is held in one place and theability to process batch requests is therefore limited by the performance of the BatchServer node.Alternatively the Batch Server can distribute jobs to utilise available Application Serversystems. These machines must each contain a copy of the batch processing code and abatch manager sub-process to determine the level of resource available. If anApplication Server finds that the system has spare capacity then it may take on a job.The batch manager sub-process then monitors the load, accepting more jobs until a pre-defined load limit is reached.
David Walker Page 9In practice it is common to find many batch jobs are run in the evening when theApplication Servers are quiet, with a small number of jobs running during the day. Tomanage this effectively the system manager may enable the batch manager sub-processes on the Application Servers only outside the main OLTP hours. During theOLTP operational window batch jobs are then limited to the Batch Server. Thiscapitalises on available resource whilst ensuring that batch and OLTP processing do notinterfere.It should be noted that all jobs submit work to the Database Server. Large batch jobsthat take out locks will inhibit the performance of the database as a whole. It isimportant to categorise jobs that can be run without affecting OLTP users and thosewhich can not. Only those tasks that do not impact the Database Server should be runduring the day.Batch Servers of this nature are made from existing products combined with bespokework tailored to meet site requirements.Print ServerThe Print Server provides the output for the complete system. There are many possibleprinting methods that could be devised. This paper will look at three. The first is thedistributed Print Server. Each place where output is produced manages a set of queuesfor all possible printers it may use. Printers need to be intelligent and able to managenew requests whilst printing existing ones. As the load increases the chance of a print-out being lost or of a queue becoming unavailable increases. The management ofmany queues also becomes very complex.An alternative is to use a central queue manager. This can be integrated with the batchmanager mentioned above and use either of the queuing strategies suggested. The pitfallwith this method is that the output file has to be moved to the Print Server before beingsent to the printer. This involves a larger amount of network traffic that may causeoverall performance problems. The movement of the file from one system to anothermay be achieved either by file transfer or by writing the output to a spool area that isNetwork File System [NFS] mounted between the systems. This will also stress theprint sub-system on the Print Server.A third method requires the print task be submitted to a central queue for each printer onthe Print Server (or in the database). This requires that the task submission includes theoriginating node. When the task is executed a message is sent back to the submittingnode. The Print Server sub-process then sends the output to the local queue. Oncompletion the sub-process notifies the Print Server. The Print Server can then queuethe next job for a given printer. In this way the network is not saturated by sendingoutput files over the network twice, but there is a cost associated with managing thequeues on machines that can submit print requests.As has already been suggested the Print Server can either be combined with the BatchServer, or clustered with it for resilience. When a failure occurs each can take on therole of the other.
David Walker Page 10The above illustrates that printer configuration requires considerable thought at thedevelopment stage. Each site will have very specific requirements and generic solutionswill have to be tailored to meet those requirements.As previously mentioned, this type of solution requires a combination of existingproduct and bespoke work.Backup and RecoveryBackup and recovery tasks have already been identified for the Database Server. This isby far the most complex server to backup as it holds the volatile data. All the otherservers identified remain relatively static. This is because they contain application codeand transient files that can be regenerated if lost. Most systems are replicas of another(Application Servers being clones and the Batch/Print Servers being clustered), but fullsystem backups should be regularly taken and recovery strategies planned.In the case of a complete black-out of the data centre it is important to understand whatlevel of service can be provided by a minimal configuration. The size of the systemsmeans that a complete hot site is unlikely to be available. The cost of the minimalconfiguration should be balanced against the cost to the business of running with areduced throughput at peak times. The full recovery of a system is non-trivial andrequires periodic full rehearsals to ensure that the measures in place meet therequirements and that the timespans involved are understood.Systems ManagementEffective management of a large number of complex systems requires a number ofspecialist tools. Many such tools have become available over the last two years. Mosthave a component that performs the monitoring task and a component that displays theinformation. If the monitoring component runs on the a server then it can increase theload on that server. Some sites now incorporate SNMP Management Information Bases[MIBs] into their applications and write low-impact SNMP agents to look at the systemand database. Information from these MIBs can be requested be the remote consolecomponent. It is desirable that any monitoring tool has the ability to maintain a history,trigger alerts and perform recovery actions. Long term monitoring of trends can assist inpredicting failures and capacity problems. This information will help avoid systemoutages and maintain availability.
David Walker Page 11NetworkingThe network load on these systems can be very complex. It is recommended that anetwork such as FDDI with its very high bandwidth is provided between the servers forinformation flow. Each machine should also have a private Ethernet or Token Ring forsystems management. This network supports the agent traffic for the systemmanagement and system administrators connections. The users are only attached to theApplication Servers via a third network. Where clusters are in use each LLI will alsorequire network connections. The distribution over many networks aids resilience andperformance.Integrating the SolutionThe methodology discussed above is complex. Much of the technology used will beleading edge. Many design decisions will be made before the final hardware platform ispurchased. A successful implementation requires hardware (including the network), thedatabase and an application. Each element must have a symbiotic relationship with theothers. This has worked best where hardware vendor, database supplier, applicationprovider (where appropriate) and customer put a team together and work in partnership.Staffing of the new system will be on a commensurate scale requiring operators andadministrators for the system, database and network. A 7 by 24 routine and cover forleave and sickness means that one person in each role will be insufficient.ConclusionThe building of a large scaleable client/server solution is possible with existingtechnologies. An important key to success is to design the system to grow. This shouldbe done regardless of whether the current project requires it or not. The easiest way tomanage the design process is to break up the system into elements and then examineeach element for scaleability and resilience. Then bring all the elements together andensure that they work together. Do not attempt to do this on your own but partner withyour suppliers to get the best results from your investment.