Your SlideShare is downloading. ×
The Architecture of CUBRID
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

The Architecture of CUBRID


Published on

This documents explains the architecture of CUBRID Database Management System.

This documents explains the architecture of CUBRID Database Management System.

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. The Architecture of CUBRID
  • 2. CONTENTS 1. Introduction ____________________________________________________________________ 3 1.1 Overall Architecture of the CUBRID System _______________________________________________5 1.2 Process Architecture ______________________________________________________________________6 1.2.1 CONNECTION CONFIGURATION ______________________________________________________7 2. Broker __________________________________________________________________________ 8 2.1 The cub_broker Process __________________________________________________________________8 2.2 The cub_cas Process _____________________________________________________________________8 3. Client and Server Modules_____________________________________________________ 10 3.1 Module Configuration______________________________________________________________________11 3.1.1 TRANSACTION MANAGEMENT COMPONENT __________________________________________11 3.1.2 SERVER STORAGE MANAGEMENT COMPONENT ______________________________________13 3.1.3 CLIENT STORAGE MANAGEMENT COMPONENT _______________________________________14 3.1.4 OBJECT MANAGEMENT COMPONENT ________________________________________________15 3.1.5 CLIENT-SERVER COMMUNICATIONS__________________________________________________17 3.1.6 THREAD MANAGEMENT COMPONENT ________________________________________________18 3.1.7 QUERY PROCESSING _______________________________________________________________18 3.2 Detailed Description for the Modules ______________________________________________________19 3.2.1 TRANSACTION MANAGEMENT COMPONENT __________________________________________19 3.2.2 OBJECT MANAGEMENT COMPONENT ________________________________________________21 3.2.3 QUERY PROCESSING _______________________________________________________________22
  • 3. CUBRID is an object-relational database management system (DBMS) consisting of the Database Server, the Broker, and the CUBRID Manager.  As the core component of the CUBRID Database Management System, the Database Server saves and manages data in a multi-threaded client/server architecture. The Database Server processes the queries entered by users and manages objects in the database. The CUBRID Database Server provides seamless transactions using locking and logging methods even when multiple users use the database at the same time. It also supports database backup and restore for the operation.  The Broker is a CUBRID-specific middleware that relays the communication between the Database Server and external applications. It provides functions including connection pooling, monitoring, and log tracing and analysis.  The CUBRID Manager is a GUI tool that manages database and broker. It also provides the Query Editor, a tool that allows users to execute SQL queries on the Database Server. The basic configuration of CUBRID is shown in Figure 1 below. 1.Introduction
  • 4. 1. Introduction Figure 1. Basic Configuration of CUBRID
  • 5. 1. Introduction 1.1 Overall Architecture of the CUBRID System Figure 2. Overall Architecture of the CUBRID System Figure 2 shows the overall architecture of the CUBRID system. The CUBRID system follows the client/server model that allows multiple applications to access the same database simultaneously. The client module (the Broker in Figure 2) and the server module (the Server in Figure 2) on separate systems (computers) are connected through a network. Even when a broker and a server on the same system are connected, the same architecture as above is configured because they are connected via socket IPC. A server performs the requests from multiple clients in a single process/multi-threaded environment, and each server process manages one database. The client module analyzes SQL queries on the database from users or applications and executes them to the optimization level. Then it generates a query plan tree and sends it to the server. And it receives the execution results from the server by using the cursor navigation and delivers them to the users or applications. The client caches object instances from the database to its memory to provide fast access to data by using the query execution results or directly by users/applications. In addition, it caches locks as well as objects from the server for concurrency control. The execution of triggers or methods specified by users or applications is also performed in the client module. The server module receives and processes requests from the client module (e.g., object requests or query execution requests from a query execution tree) and then returns the query execution results. The server can execute the requests from multiple clients in a single process/multi-threaded environment. To support multiple client modules with the appropriate number of threads, the server threads are allocated to each broker request, not to each broker. The server performs input and output operations for database and log volume and provides a file access method to the database volume in a file or page. In addition, it manages page buffer in a memory and uses a B+-tree index to
  • 6. 1. Introduction increase retrieval speed. The server also provides concurrency control, deadlock detection, and failover between multiple transactions. 1.2 Process Architecture Figure 3. Process Architecture of the CUBRID System Figure 3 shows the process architecture of the CUBRID system. In the server host, there can be one master process (cub_master) and more than one database server process (cub_server). Each client process (cub_cas) that exists in multiple broker hosts connects to each single database server process. The cub_broker process allocates cub_cas, passes a connection and manages cub_cas for a connect request from an application. The cub_cas process executes database queries from the application.
  • 7. 1. Introduction 1.2.1 Connection Configuration The cub_cas process connects to the defined connection port number of the master process. The master process checks whether the requested database server is running; the connection request is rejected if the server is not running. If the requested database server is running, the master process passes the connected socket to the requested server process. Then, the server process communicates with the client process (cub_cas) directly through the socket. The database server process connects to the master process's port and then registers its server name (database name) and establishes a UNIX Domain Socket (or Named Pipe) connection to the master process. In this connection, the master process passes a socket descriptor to the client (cub_cas); the connection is maintained for server shutdown and other future operations. After the connection between the server and client processes (cub_cas) is established, the server process allocates threads for each client request and performs tasks.  Master Process (cub_master) 1. Checks whether other master process is running by connecting to cubrid_port_id 2. Switches to the demon process, opens a socket to the port defined as cubrid_port_id, and waits for the connection between the client and the server. 3. Registers a server name and establishes a UNIX domain socket connection to the server process if the connection is from the database server process. 4. Passes the connected socket number (socket descriptor) to the database server requested by the client to establish a socket connection between the client and the server if it is connected from the client process (cub_cas).  Database Server Process (cub_server) 1. Connects to the designated port of the master process. If the connection fails, the connection attempt is aborted, assuming that the master process is not running. 2. Registers its server name (database name) to the master process if the connection to the master process is established. At this time, if a server with the same name already exists, the registration is rejected, and the server is terminated. 3. Creates a UNIX Domain socket (or Named Pipe), sends a connection path (socket file path) to the master process and terminates the socket connection to the designated port when the master process is connected. 4. Waits for task requests from the connected client. At this time, a connection relay of a new client from the master process is processed, if any. 5. Accepts requests from the connected client and performs tasks by allocating threads.  Client Process (cub_cas) 1. Connects to the master process that exists on a remote or local server through the port defined as cubrid_port_id. 2. Sends the name of the database to connect when the connection to the master process is established and checks whether the database server process is registered and running. At this time, the connection is rejected if there is no corresponding server. 3. Receives response messages directly from the server because the master process passes the socket connection between the client and the master process to the corresponding server process.
  • 8. The Broker is a middleware that relays the communication between the database server and applications. It consists of cub_broker and cub_cas. 2.1 The cub_broker Process The cub_broker process allocates cub_cas, passes a connection and manages cub_cas for a connection request from an application. cub_broker has a multi-threaded architecture and consists of the following threads:  main This thread creates other threads and manages the number of cub_cas processes. It increases or decreases the number of cub_cas processes depending on the number of requests in the job queue.  receiver_thread As a thread waiting for the accept() system call, this thread puts a connection request from an application into the job queue.  dispatch_thread This thread finds cub_cas available to allocate to the connection requests in the job queue and passes the connection to cub_cas.  cas_monitor_thread If cub_cas is abnormally terminated, this thread restarts cub_cas. 2.2 The cub_cas Process The cub_cas process executes database queries from an application and has a single thread architecture. This process connects to the database server when it receives a “connection” request from an application and calls a function corresponding to the request from the application. After the connection with the application is terminated, this process can receive a connection from another application. When disconnecting an application, the connection to the database server is not terminated. If next application uses the same database as the current one, the existing database connection is reused. Depending on the application's connection status, cub_cas has four statuses: IDLE, BUSY, CLIENT WAIT, or CLOSE WAIT. - IDLE: No connection is made to an application. - BUSY: A connection is made to an application, and the request from the application is being processed. 2.Broker
  • 9. 2. Broker - CLIENT WAIT: A request from an application is waited for, and a transaction is being processed. - CLOSE WAIT: A request from an application is waited for but a transaction has been terminated. If the connection between cub_cas and an application is disconnected in this status, the application attempts reconnection. The cub_cas process waits for the select() call after a connection to the application is established and processes each function passed by the application. Main functions that respond to requests from an application are as follows:  fn_end_tran This function performs commit/rollback. If KEEP_CONNECTION is set to off in the cubrid_broker.conf file, it terminates the connection the application when a transaction is terminated; establishes a new connection when a new transaction starts. If KEEP_CONNECTION is set to auto, the status of cub_cas changes to CLOSE_WAIT when a transaction is terminated. In this case, if the application connected to cub_cas has not sent a new request, and a new application has sent a "connection" request, the cub_broker process can select the cub_cas whose status is CLOSE_WAIT to terminate the connection to the previous application and send a request to cub_cas asking for the connection to a new application.  fn_prepare This function processes a prepare request from an application. It compiles the queries, creates a handle for the compiled query and sends it to the application. Then, the application sends an execution request by using the created handle. After the queries are compiled, if they are the SELECT queries, meta information on columns is extracted and sent to the application.  fn_execute This function executes a prepared query statement. If the query statement is SELECT, it sends the query results as the specified buffer size and sends the query execution results for other query statements. If JDBC RESULT CACHE is in use and the executed query already exists in JDBC RESULT CACHE, this function determines whether the stored query results can be reused. If they can be reused, the query results are not sent. Instead, only a flag indicating reusability is sent to the JDBC.  fn_fetch This function copies the query results of the SELECT statement as the specified buffer size and sends them to an application.
  • 10. This chapter describes the components of the entire server (hereinafter, the server) and the native C API & other modules (hereinafter, the client) in the Client Library of the Broker as shown in Figure 4. Figure 4. Detailed Architecture of the CUBRID System 3. Client and Server Modules
  • 11. 3. Client and Server Modules 3.1 Module Configuration The CUBRID client and server modules consist of the following components:  Transaction Management Component Handles system transactions across the client and server (including system failover).  Server Storage Management Component Accesses and manages database and log volume on the server (including page buffering).  Client Storage Management Component Allocates and manages a workspace for the object cache and access on the client.  Object Management Component Defines a class object, creates and modifies an object, converts the object representation structure between the disk and the memory.  Client-Server Communications Manages the network communication between the client and the server.  Thread Management Manages threads of a server process.  Query Processing Executes query plans on the server, which are created by translating, analyzing and optimizing SQL statements on the client. The module configuration of each component is described in the following section. 3.1.1 Transaction Management Component The Transaction Management Component consists of the modules in dark blue in Figure 5.
  • 12. 3. Client and Server Modules Figure 5. Module Configuration of Transaction Management Component  Object Locator As a module passing object data between a workspace on the clients and the page buffer pool on the server, it caches an object and acquires a lock to a workspace.  Transaction Manager As a module performing transaction start, commit, and rollback, it initializes other modules (lock/log/recovery manager) of Transaction Management Component. This module also supports commit, rollback, and savepoint including 2PC (2-phase commit).  Lock Manager As a module performing lock management based on the 2PL (2-phase locking) protocol, it supports a granularity locking protocols.  Recovery Manager
  • 13. 3. Client and Server Modules As a module protecting database consistency from the system failure, it employs a failover method that uses UNDO/REDO logging and the WAL (Write Ahead Logging) protocol. This module supports total rollback, partial rollback (to savepoint), and nested top operation, and uses LSA (Log Sequence Address) and CLR (Compensation Log Record), etc. 3.1.2 Server Storage Management Component The Server Storage Management Component consists of the modules shown in Figure 6. Figure 6. Module Configuration of Server Storage Management Component  I/O Manager As a module performing I/O tasks for the disk volume (or volume file), it performs a volume mount/unmount process and locks a volume. This module performs write synchronization for a log volume.  Page Buffer Management As a module managing the page buffer in a virtual memory that is used for disk page buffering, it employs the LRU page replacement algorithm and the FIX/UNFIX protocol to use page buffer. In addition, this module uses a hash table to quickly retrieve a requested page in the buffer pool.  Disk Manager It is a module managing the internal structure of the disk volume (or volume file). A volume consists of sectors, and a sector is a group of continuous pages. Each volume consists of system area and user area. The bit allocation map is used for page allocation in the volume.
  • 14. 3. Client and Server Modules  File Manager As a module helping access to a database only in a file and page regardless of internal structure of the volume (volume, sector and page), it is used in a file structure such as B+-tree, heap, or hash. The File Manager module keeps and manages information on the sector that is allocated to a file in a file header.  Slotted Page Manager As a module inserting, deleting and updating records in a file page, it provides slot structure that indicates the position (offset) of records in a page; it can move records in a page through a slot.  Overflow Page Manager A module inserting, deleting and updating records with the size of over one page in an overflow page area. With this module, you can treat a large size data atomically.  Object Heap Manager It is a module inserting, deleting and, updating an object in a file through the heap structure. The instances (records) of a class (table) are stored into an object heap file, and a unique OID (object identifier) is allocated to each record. The OID consists of "Volume ID | + Page ID + Slot ID," and it is not reused except for a special case. This OID expression is the same as disk addressing in the Disk Manager. That is, the OID indicates the physical location of a disk where a record is stored.  Extendible Hash Manager As a module providing the extendible hashing to access data quickly, it is used to retrieve class OIDs with a class name.  B+-tree Manager As a module providing an index file structure based on the prefix B+-tree, it inserts, deletes, and retrieves a key for B+-tree.  Long Data Manager As a module processing ad-hoc large objects such as multimedia data, it can modify part of the data. 3.1.3 Client Storage Management Component The Client Storage Management Component consists of the modules shown in Figure 7.
  • 15. 3. Client and Server Modules Figure 7. Module Configuration of Client Storage Management Component  Workspace Manager A module managing the database objects cached in the workspace of the client process. Through an object table implemented as a hash, it converts a disk object identifier OID to a memory object pointer (MOP). The MOP has a memory pointer that helps access to objects cached in the client memory.  Garbage Collector A module collecting garbage for the client workspace. This module releases the memory that is allocated to MOPs and cached objects.  Quick Fit Storage Allocator A module allocating a memory to the workspace for an object. 3.1.4 Object Management Component The Object Management Component consists of the modules in Figure 8.
  • 16. 3. Client and Server Modules Figure 8. Module Configuration of Object Management Component  Representation Manager This is a module performing conversion between disk expression structure and memory expression structure of an object. An object data is suitable to query execution in a disk and it has a structure which helps an application access it in a memory. The Representation Manager does conversion between these two expression formats. It also performs byte ordering during conversion.  Schema Manager As a module defining and changing a class, it creates, modifies, or manages the inheritance of a column, method, or class.  Object Access Manager As a module creating, deleting, modifying, checking an object or calling a method, it is closely related to the Schema Manager.  Dynamic Loader A module providing a dynamic link to an application that is executing methods written in C.  Trigger Manager A module implementing a trigger feature with a system object. This module is closely related to the Schema Manager and Object Access Manager.
  • 17. 3. Client and Server Modules  Authorization Manager A module checking the authority of a database user. This module is implemented on top of the API provided by the Object Access Manager.  Data Type and Domain A module manipulating internal data structure (representation format) for data type and domain information. This module caches the information about the used domain to a connection list and has a domain conversion matrix. 3.1.5 Client-Server Communications Client-Server Communications consists of the modules in Figure 9. Figure 9. Module Configuration of Client-Server Communications  Socket Manager A module managing communications in the client, the server and the master process (cub_master). This module manages the procedures of connection to the client or server through the master process.  Packet Manager A module processing a packet that is used to exchange information between the client and the server. The packet types include request packet, data packet, close packet, out-of-band packet, or error packet. The request packet and data packet can communicate asynchronously by using a queue in the client and server.  Client-Server Interface A module providing an interface to use Client-Server Communications in the system. This module processes an exception that occurs during communications as well as out-of-band such as user interrupt, etc.
  • 18. 3. Client and Server Modules 3.1.6 Thread Management Component Thread Management Component manages multiple threads in the server process; it is implemented by using pthread. This component detects a request from the client by using the select() system call and allocates a task to the threads per each request. Similarly, the worker thread processing a request from the client waits for a task in the Job Queue and wakes up when a task enters the process. After it processes the task, it waits for another task in the Job Queue. There are also system threads that process only special system tasks as well as this worker thread.  Deadlock detection thread This thread checks whether a deadlock occurs at a fixed interval or when there is a lock request, and it solves a problem when there is a deadlock.  Checkpoint thread This thread performs a checkpoint feature that flushes the data page, which is already committed at a fixed interval but not reflected to the disk and cached in the page buffer. Performing a periodic checkpoint reduces the restore time during failover.  OOB (out-of-band) thread This thread receives the OOB signal and passes it to thread.  Page-flush thread This thread periodically flushes the dirty pages in the page buffer to the disk. This improves system performance by reducing flushing dirty pages to the disk during page replacement.  Log-flush thread This thread flushes the log page to the log volume. It provides group and asynchronous commit methods by using the log flush thread. 3.1.7 Query Processing The Query Processing consists of the following modules.  Scanner/Parser As a module translating queries (SQL) from users or applications, it creates a parse tree.  Semantic Checker A module performing node typing, name resolution, semantic checking, or view translation, etc.  XASL Generator/Optimizer A module creating XASL (eXtended Access Specification Language) tree which is a query execution plan and performing query optimization by using schema information and database statistics. The XASL tree includes scan information (heap scan, index scan, list file scan, set scan, and method scan), a value list (values required for query results) and predicate. The query optimization employs cost-based optimization and rewrite optimization.  Query Manager A server module executing a given XASL_tree from the client. This module consists of the Query File Manager that stores the query's XASL plan and its results as well as the Query Evaluator that evaluates queries and
  • 19. 3. Client and Server Modules creates a result list file. This module interfaces with the Transaction Manager or Recovery Manager to approve or cancel a transaction.  Cursor Manager A module fetching data from the list file that is created as the retrieval results. 3.2 Detailed Description for the Modules 3.2.1 Transaction Management Component A. Object Locator The Object Locator is a module delivering object data between a workspace on the clients and the page buffer pool on the server. The Object Locator provides simultaneous access, use, and failover for database objects by using the Transaction Management Component's locking and restore algorithm. The Object Locator is divided into Object Locator on the client, Object Locator on the server, and Object Locator on the client/server. The Client Object Locator executes its tasks by using Workspace Manager, Representation Manager (Transformation Manager), and Heap File Manager. The Authorization Manager, Schema Manager, Object Access Manager and Query Parser (Scanner/Parser) use the functions of Client Object Locator. The Server Object Locator executes tasks by using Object Heap Manager, Representation Manager (Transformation Manager), Lock Manager, Catalog Manager, and B+-tree Manager. In the Client Object Locator, the functions of Server Object Locator module is used for object fetch and flush. The objects that are cached to the workspace of a client by the Object Locator maintains coherency with the objects in a server by using cache coherency number. If the cache coherency number of an object, that is cached into the workspace of a client, is not the same as the cache coherency number of an object that exists in the page buffer (or disk) of a server, the cached object becomes invalid (invalidation). The Server Object Locator increases the cache coherency number of an object whenever an object is flushed from a server and it is sent to a server. Validation check for a cached object is performed when the object is first used by transaction. Because lock is also cached (set up) when an object is cached, the validation of an object is effective while one transaction is being executed. When a transaction requests an object, the Client Object Locator checks whether the object and its lock are cached. If both the object and lock are cached, the transaction can use the cached objects in the workspace memory much faster. If neither the object nor lock is cached, send a request to the Server Object Locator. The Server Object Locator sets up lock that is requested for an object by using the Lock Manager. When lock is acquired, the cache coherency number of an object in the workspace and the cache coherency number of an object that exists in the database (page buffer or disk) of a server are compared. If these two values are different, a new object data from the server is sent to the client and it replaces the old cached object. When a transaction is terminated, the cached objects are flushed to a server. When a transaction is rolled back, the objects are all de-cached. In addition, when a class object is invalidated (e.g., a schema is changed by a transaction of another client), all the instance objects in the class are flushed/de-cached all together. And all the objects are flushed to a server together with query execution requests because queries are executed in a server. To reduce the communication amount between a client and a server, the Object Locator sends flush data together with object fetch request packet or pre-fetches related class objects or other surrounding objects when caching objects.
  • 20. 3. Client and Server Modules The Server Object Locator fetches an object from database and updates it to the database upon the request of Client Object Locator by using the Heap File Manager. In addition, it manages lock setting by using the Lock Manager. B. Transaction Manager The Transaction Manager is a module which does transaction start, approval, and rollback, etc. The Transaction Manager calls the Object Locator to flush an object that is used for transaction, the Lock Manager to release a cached lock, or the Log Manager (Recovery Manager) for transaction approval/rollback. The Transaction Manager is divided into a client and a server. When an application requests transaction termination (approval, rollback), the Client Transaction Manager flushes the objects (among the objects in the workspace) that are changed during transaction execution to the page buffer of a server. (If it is rollback request, the changed objects are not flushed to a server. Instead, they are immediately removed from the workspace.) Next, the Client Transaction Manager requests approval/rollback to the Server Transaction Manager. In case of approval, the Server Transaction Manager calls the Log Manager (Recovery Manager) executes postpone action to the database in a server and also loose_end postpone action in a client. After that, it releases all the acquired locks and closes all the open cursors. In case of rollback, the Log Manager (Recovery Manager) returns the tasks that are executed by transaction by using UNDO log and releases all the acquired locks. When a transaction is approved or rolled back, the locks that are cached by the Client Transaction Manager are all released. It supports 2PC (2-phase commit) protocol for global transaction. C. Lock Manager The Lock Manager is a module that manages locks according to the 2PL (2 Phase Locking) protocol and Granularity Locking protocol. The Lock Manager searches for a transaction identifier, calls the Log Manager (Recovery Manager) to get the lock waiting time of a transaction, and calls the Server Transaction Manager to roll back a transaction to handle deadlock. The Server Object Locator uses the Lock Manager to acquire and release a lock for an object and the Log Manager uses the Lock Manager to release locks all together. When accessing an instance object, lock setting is necessary for the class objects that define the all attributes of the instance and also for the upper class objects that are inherited. In case of the schema change for a class object, eXclusive lock must be set for the class and its lower classes. In case of query execution, the instance of a class and the instance of its lower classes are all searched. In addition, because a class object is a domain that defines the corresponding instance, the domain class and its lower classes are all accessed. Therefore, set up shared lock for the class to search and its lower classes and also the domain class that defines an instance and its lower classes during query execution. To detect a deadlock, WFG (Waits-For-Graph) method is used. If WFG detects a deadlock, one of the involved transactions is forcibly terminated by the system. The Lock Manager manages Lock Table. The Lock Table is implemented with hash table for OID and access to the table is set up as critical section to maintain consistency. D. Recovery Manager The Recovery Manager reflects the status of all the committed transactions to the database and does not reflect the effect of transactions that are not committed when any fault to transaction, system, or media occurs. For this, the Recovery Manager records a log and restores database from diverse faults based on the log. The CUBRID Recovery Manager uses UNDO/REDO restore protocol and this protocol is based on the following rules:
  • 21. 3. Client and Server Modules  UNDO Rule Record data value before it is changed. It is assured the last committed value is recorded into a log before it is overwritten by a value that is not yet committed.  REDO Rule The values updated by a transaction are surely recorded into a log before the transaction is committed. That is, the data value before committing is recorded into a log. A log is a file in which data is appended in an arbitrary length. To implement a log file with infinite length, recent log data is recorded into an active log and previous log data is archived into an archive log. The UNDO/REDO logging is designed to achieve the maximum efficiency during general operation, rather than database system fault restore time. The flush of data page can be avoided as much as possible during commit or rollback due to the logging protocol. The data page is only written to a disk only when it is replaced by another page. 3.2.2 Object Management Component The Object Management Component defines a table, creates or modifies an object, and formats an object in a disk or memory. A. Representation Manager This is a module performing conversion between disk expression structure and memory expression structure of an object. An object data is suitable to query execution in a disk and it has a structure which helps an application access it in a memory. The Representation Manager does conversion between these two expression formats. Figure 10. Disk Expression Format of an Object
  • 22. 3. Client and Server Modules The disk expression format of an object is shown in Figure 10. The class OID and Representation ID of an object come first, and these are used to judge which format the object has. The following CHN (Cache Coherency Number) is used to judge the validity of caches object. In the disk expression format, the columns (attributes) are divided into a fixed length type column where all the values have the same length just like an integer and a variable length column where all the values have different lengths just like a string. The fixed length columns are saved into a pre-defined location, and the location of each column is obtained from the information that is managed by the Catalog Manager. The location of the variable length column is obtained from the variable length column offset table which has location information of each variable length column. The last entry of offset table indicates the end of an object. The offset table is not saved for the object of a table which has no variable length column. When an object is cached into a memory, the MOP indicates a memory block that has the columns of the object. The fixed length column values are continuously saved into an object block and the values of a variable length column are saved into a memory block that is separately allocated. The CHN is also included in the memory expression format. The object locator compares this CHN value and the CHN value that is stored in a disk to judge the validity of an object. If two CHN values are different, it means the object that is cached to the memory is not valid. Then, the object locator de-caches the object and caches the content of a new object. Figure 11. Memory Expression Format of an Object The Representation Manager uses the Workspace Manager to receive a storage space for the memory expression of an object and uses the Schema Manager to determine the size and architecture of an object. When the CUBRID changes schema, it does not change the expression format of the records in the schema. Therefore, if you find an object that is saved in the old expression format during the conversion process between two expression formats, convert it to the recent expression format. At this time, use schema information for the recent expression format and the old expression format. During expression format conversion process, convert the difference of hardware architecture between the client equipment and the server equipment, e.g., the byte ordering difference. 3.2.3 Query Processing
  • 23. 3. Client and Server Modules Figure 12. The Procedures of Query Compile in a Client A. Scanner/Parser The parser keeps the data structure to create a parse tree during parsing process, the data structure to maintain the created parse tree, and data structure to manage multiple SQL statements, and information about lexer. B. Semantic Checker If a parse tree is configured without an error, it means a query statement with correct syntax is input. Semantic checking is a feature that checks whether the semantics of an input statement is valid. It performs the following tasks: 1. Name resolution and parse tree node type checking Checks whether an existing table or column is used and infers the type of a column. 2. Semantic checking Checks whether an operation that is not supported between types is used. 3. View translation Converts the definition statement of a view.
  • 24. 3. Client and Server Modules C. XASL Generator/Optimizer The query statement input by a user goes through parsing and semantic checking, and then it is converted into the augmented parse tree where catalog information is listed. When query optimization is performed based on this augmented parse tree, the XASL tree, i.e. action plan, is created as a result. The XASL tree is a tree where the most optimized access sequence and method are specified for the tables to access during query execution. It consists of action plans which has the lowest access path cost among many other possible plans. With a parse tree and catalog statistics information, one XASL tree can be created as follows: 1. Classifying terms to configure search conditions in table units A term becomes a search condition for one or more tables. When there is one table to which the term is applied, the term is scan term (sarg). If there are two, the term is join term (edge). If there are three, the term is other term. For the terms specified in the where clause of a parse tree, divide them into join terms or scan terms. Classify the scan terms according to the table to which each term is applied. 2. Determining the most optimized access method to each table For the scan terms that will be applied to an arbitrary table, calculate the selectivity of each scan term and select a search method of a term whose selectivity is lowest as a table search method. That is, determine whether to use sequential scan or index scan for a table. If the index scan is used, determine which index to use. 3. Calculating selectivity for each table Calculate the selectivity of each table by using the selectivity of each scan term that is calculated in the step 2. 4. Determining access sequence among tables To determine the access sequence among tables, list various access sequences and calculate access path cost of each case. Select the execution sequence whose access cost is lowest as the final execution plan. 5. Creating XASL tree for the final execution plan D. Query Manager This is a server module that executes a XASL tree from a client. During Query Processing, a client sends a XASL tree that is created through the XASL Generator/Optimizer module to a server. A query is executed when the server receives and executes this XASL tree. Actually, it is undesirable, in terms of performance, to go through the XASL Generator/Optimizer whenever there is a query of the same pattern, the CUBRID saves the XASL tree into the Query Plan Cache and reuses it. In addition, when the same query is executed repeatedly, it saves the query result into the Query Cache and returns the result without query execution next time.
  • 25. 3. Client and Server Modules Figure 13. Query Execution on the Server The procedure of query processing through these components is shown in Figure 14.
  • 26. 3. Client and Server Modules Figure 14. Query Execution Steps