So far, we have looked at many aspects of designing, creating, populating and querying a database. We have (briefly) explored ‘optimisation’ which is used to ensure that query execution time is minimised
In this lecture we are going to look at some techniques which are used to improve performance and availability
Because databases are required to be available, in many installations and applications, 24 hours a day, 7 days a week, 52 weeks every year - think of the ‘user’ demands in e-business
In this arrangement, ‘shared all’ clustering doesn’t scale as effectively as shared-nothing clustering for small machines. All the nodes have access to the same data, so a controlling facility must be used to direct processing to make sure that all nodes have a consistent view of the data as it changes
Attempts by more than one nodes to update the same data need to be prohibited. This can cause performance and scalability problems
Shared-all architectures are well suited to the large scale processing found in main frame environments
Main frames are large processors capable of high work loads. The number of clustered PC’s and midrange processors, even with the newer, faster processors, which would equal the computing power from a few clustered mainframes, would be high - about 250 nodes.
There is another technique - InfiniBand architecture which can reduce bottlenecks in the Input/Output level, and which has a further appeal of reducing the cabling, connector and administrative overheads of the database infrastructure
It is an ‘intelligent’ agent - meaning software.
Its main attraction is that it can change the way information is exchanged in applications. It removes unnecessary overheads from a system
The ‘queries’ actually access data from a number of databases at a number of locations
One of the interesting aspects of a federated database is that the individual databases may consist of any DBMS (IBM, Oracle, SQL Server, possibly MS Access) and run on any operating system (Unix, VMS, MS-XP) and on different hardware ( Hewlett-Packhard servers, Unisys, IBM, Sun Microsystems …..
Acceptable performance requires the inclusion of a smart optimiser using the cost-based technique which has intelligence about both the distribution (perhaps a global data dictionary) and also the different hardware and DBMS at each accessed site.
3. The database server can be a mid-range, or several servers.
4. Another aspect is that it is probably most unlikely to run a query which regularly needs access to all of the individual databases - but with the centralised approach all of the data needs to be ‘central’.
5. Local database support local queries - that’s probably why the local databases were introduced.
(A materialised view stores replicated data based on an underlying query. A materialised view stores data which is replicated from within the current database).
A Snapshot stores data from a remote database.
The system optimiser may choose to use a materialised view instead of a query against a larger table if the materialised view will return the same data and thus improve response time. A materialised view does however incur an overhead of additional space usage, and maintenance)
As you have probably guessed, there are other tablespaces which require to be considered - many used by the many and various ‘processes’ of Oracle
One of these considerations is the on-line redo log files (you remember these and their purpose ?)
They store the records of each transaction. Each database must have at least 2 online redo log files available to it - the database will write to one log in sequential mode until the redo log file is filled, then it will start writing to the second redo log file.
The Online Redo Log files maintain data about current transactions and they cannot be recovered from a backup unless the database is/was shut down prior to backup - this is a requirement of the ‘Offline Backup’ procedure (if we have time we will look at this)
On line redo log files need to be ‘mirrored’
A method of doing this is to employ redo log groups - which dynamically maintain multiple sets of the online redo logs
The operating system is also a good ally for mirroring files
Redo log files should be placed away from datafiles because of the performance implications, and this means knowing how the 2 types of files are used
Every transaction (unless it is tagged with the nologging parameter) is recorded in the redo log files
The entries are written by the LogWriter (LGWR) process
The data in the transaction is concurrently written to a number of tablespaces(the RBS rollback segments and the Data tablespace come to mind) via the DataBase Writer (DBWR) and this raises possible contention issues if a datafile is located on the same disk as a redo log file