● Distributed Database Management Systems Advantages and Disadvantages.
● Characteristics of Distributed Database Management Systems.
● Levels of Data and Process Distribution.
● Distributed Database Transparency Features.
● Transaction Performance and Failure Transparency.
2. Learning Objectives
In this chapter, the student will learn:
About distributed database management systems
(DDBMSs) and their components
How database implementation is affected by different
levels of data and process distribution
How transactions are managed in a distributed database
environment
2
3. Learning Objectives
In this chapter, the student will learn:
How distributed database design draws on data
partitioning and replication to balance performance,
scalability, and availability
About the trade-offs of implementing a distributed data
system
3
4. Distributed database
A set of databases in a distributed system that can
appear to applications as a single data source.
4
Hierarchical Arrangement of
Networked Databases
Homogeneous
Distributed Database
5. Important considerations
There are two principal approaches to store a relation
in a distributed database system:
Replication: Database replication is the frequent
electronic copying of data from a database in one computer
or server to a database in another so that all users share the
same level of information.
Fragmentation/Partitioning: Fragmentation is a database
server feature that allows you to control where data is stored
at the table level.
Fragmentation enables you to define groups of rows or index keys
within a table according to some algorithm or scheme. You can use
this table to access information about your fragmented tables and
indexes.
5
6. Distribution scheme for table
fragmentation (1/2)
The following example includes a FRAGMENT BY
EXPRESSION clause to create a fragmented table
with an expression-based distribution scheme:
6
7. Distribution scheme for table
fragmentation (2/2)
7
Here the first three fragments are stored in partitions of the dbs1 dbspace, and
the other fragments, including the remainder, are stored in named fragments of
the dbs2 dbspace. Explicit fragment names are required in this example,
because each dbspace has multiple partitions.
8. How to Check Index Fragmentation on
Indexes in a Database
The following is a simple query that will list every index on every table in
your database, ordered by the percentage of index fragmentation.
8
9. Global Name as a Loopback Database Link
You can use the global name of a database as a
loopback database link without explicitly creating a
database link. When the database link in a SQL
statement matches the global name of the current
database, the database link is effectively ignored.
For example, assume the global name of a database is
db1.example.com. You can run the following SQL
statement on this database:
9
10. SQL statements that create database links in a local database to
the remote sales.us.americas.example_auto.com database
CREATE DATABASE LINK
sales.us.americas.example_auto.com USING
'sales_us';
Connects To Database
sales using net service name sales_us
Connects As
Connected user
Link Type
Private connected user
10
11. SQL statements that create database links in a local database to
the remote database
CREATE DATABASE LINK foo CONNECT
TO CURRENT_USER USING 'am_sls';
Connects To Database
sales using service name am_sls
Connects As
Current global user
Link Type
Private current user
11
12. SQL statements that create database links in a local database to
the remote sales.us.americas.example_auto.com database
CREATE DATABASE LINK
sales.us.americas.example_auto.com
CONNECT TO SAAD IDENTIFIED BY
password USING 'sales_us';
Connects To Database
sales using net service name sales_us
Connects As
SAAD using password password
Link Type
Private current user
12
13. SQL statements that create database links in a local database to
the remote sales.us.americas.example_auto.com database
CREATE PUBLIC DATABASE LINK sales
CONNECT TO SULTAN IDENTIFIED BY
password USING 'rev';
Connects To Database
sales using net service name rev
Connects As
SULTAN using password password
Link Type
Public current user
13
14. SQL statements that create database links in a local database to
the remote sales.us.americas.example_auto.com database
CREATE SHARED PUBLIC DATABASE LINK
sales.us.americas.example_auto.com CONNECT
TO WALEED IDENTIFIED BY password
AUTHENTICATED BY USMAN IDENTIFIED BY
password1 USING 'sales';
Connects To Database
sales using net service name sales
Connects As
WALEED using password password, authenticated as
USMAN using password password1
Link Type
Shared public fixed user 14
15. Distributed processing
The operations that occurs when an application
distributes its tasks among different computers in a
network.
For example, a database application typically distributes
front-end presentation tasks to client computers and
allows a back-end database server to manage shared
access to a database. Consequently, a distributed database
application processing system is more commonly referred
to as a client/server database application system.
15
16. Evolution Database Management
Systems
Distributed database management system
(DDBMS): Governs storage and processing of
logically related data over interconnected
computer systems
Data and processing functions are
distributed among several sites
Centralized database management system
Required that corporate data be stored in a
single central site
Data access provided through dumb
terminals
16
19. Naming of Schema Objects Using
Database Links
Oracle Database uses the global database name to
name the schema objects globally.
Global database names are in the following form:
schema.schema_object@global_database_name
For example, using a database link to database
sales.division3.example.com, a user or application can
reference remote data as follows:
19
SELECT * FROM scott.emp@sales.division3.example.com;
# emp table in scott's schema
-----------------------
SELECT loc FROM
scott.dept@sales.division3.example.com;
EXPLANATORY SLIDE
20. For example, assume that you connect to the local
database as user SYSTEM:
CONNECT SYSTEM@sales1
You then issue the following statements using
database link hq.example.com to access objects in
the scott and jane schemas on remote database hq:
SELECT * FROM scott.emp@hq.example.com;
INSERT INTO jane.accounts@hq.example.com (acc_no,
acc_name, balance)VALUES (5001, 'BOWER', 2000);
UPDATE jane.accounts@hq.example.com
SET balance = balance + 500;
DELETE FROM jane.accounts@hq.example.com
WHERE acc_name = 'BOWER';
20
21. Figure 12.1 - Centralized Database
Management System
21
22. Factors Affecting the Centralized
Database Systems
Globalization of business operation
Advancement of web-based services
Rapid growth of social and network technologies
Digitization resulting in multiple types of data
Structured, unstructured, semi-structured data
Time-stamped data, etc.
Innovative business intelligence through analysis of data
22
23. An Oracle Distributed Database
System
A client can
connect directly or indi
rectly to a database
server.
A direct connection
occurs when a client
connects to a server
and accesses
information from a
database contained on
that server.
23
EXPLANATORY SLIDE
25. Rules for a DDBMS
To the user, a distributed system should look exactly like a non distributed
system.
1. Local Autonomy
2. No Reliance on a Central Site
3. Continuous Operation
4. Location Independence
5. Fragmentation Independence
6. Replication Independence
7. Distributed Query Processing
8. Distributed Transaction Processing
9. Hardware Independence
10. Operating System Independence
11. Network Independence
12. Database Independence
Last four rules are ideals. 25
EXPLANATORY SLIDE
Homogeneous Distributed Database
28. Remote SQL Statements
A remote update statement is an update that
modifies data in one or more tables, all of which are
located at the same remote node.
For example, the following query updates the dept table
in the scott schema of the remote sales database:
28
EXPLANATORY SLIDE
29. Distributed SQL Statements
A distributed SQL statement either queries or
modifies data on two or more nodes.
A distributed query statement retrieves information
from two or more nodes.
For example, the following query accesses data from the
local database as well as the remote sales database:
29
EXPLANATORY SLIDE
30. Distributed UPDATE statement
A distributed update statement modifies data on two or
more nodes. A distributed update is possible using a PL/SQL
sub-program unit such as a procedure or trigger that includes
two or more remote updates that access data on different
nodes.
For example, the following PL/SQL program unit updates
tables on the local database and the remote sales database:
30
EXPLANATORY SLIDE
31. Factors That Aided DDBMS to Cope
With Technological Advancement
Acceptance of Internet as a platform for business
Mobile wireless revolution
Usage of application as a service
Focus on mobile business intelligence
31
32. Desirability of Distributed DBMS
Over Centralized DBMS
Performance
degradation
High costs
Reliability
problems
Scalability
problems
Organizational
rigidity
32
33. Advantages and Disadvantages of
DDBMS
Advantages
• Data are located near
greatest demand site
• Faster data access and
processing
• Growth facilitation
• Improved communications
• Reduced operating costs
• User-friendly interface
• Less danger of a single-
point failure
• Processor independence
Disadvantages
• Complexity of management
and control
• Technological difficulty
• Security
• Lack of standards
• Increased storage and
infrastructure requirements
• Increased training cost
• Costs incurred due to the
requirement of duplicated
infrastructure
33
34. Characteristics of Distributed
Management Systems
Application
interface
Validation Transformation
Query
optimization
Mapping I/O interface Formatting Security
Backup and
recovery
DB
administration
Concurrency
control
Transaction
management
34
35. Functions of Distributed DBMS
Receives the request of an application
Validates, analyzes, and decomposes the request
Maps the request
Decomposes request into several I/O operations
Searches and validates data
Ensures consistency, security, and integrity
Validates data for specific conditions
Presents data in required format
35
36. Figure 12.4 - A Fully Distributed Database
Management System
36
37. DDBMS Components
Computer workstations or remote devices
Network hardware and software components
Communications media
• Transaction processor (TP): Software component of a
system that requests data
Known as transaction manager (TM) or application
processor (AP)
Data processor (DP) or data manager (DM)
Software component on a system that stores and
retrieves data from its location
37
38. Single-Site Processing, Single-Site
Data (SPSD)
Processing is done on a single host computer
Data stored on host computer’s local disk
Processing restricted on end user’s side
DBMS is accessed by dumb terminals
38
39. Multiple-Site Processing, Single-Site
Data (MPSD)
Multiple processes run on different computers
sharing a single data repository
Require network file server running
conventional applications
Accessed through LAN
Client/server architecture
Reduces network traffic
Processing is distributed
Supports data at multiple sites
39
40. Figure 12.7 - Multiple-Site Processing,
Single-Site Data
40
41. Multiple-Site Processing, Single-Site
Data (MPSD)
Fully distributed database management system
Support multiple data processors and transaction
processors at multiple sites
Classification of DDBMS depending on the level of
support for various types of databases
Homogeneous: Integrate multiple instances of same
DBMS over a network
Heterogeneous: Integrate different types of DBMSs (e.g.
Object-oriented Databases, Document Databases, Relational Databases, etc.)
Fully heterogeneous: Support different DBMSs, each
supporting different data model (e.g. Entity-Relationship Model,
network model, etc. ) 41
42. Restrictions of DDBMS
Remote access is provided on a read-only basis
Restrictions on the number of remote tables that may
be accessed in a single transaction
Restrictions on the number of distinct databases that
may be accessed
Restrictions on the database model that may be
accessed
42
43. Distributed Database Transparency
Features (cont.)
Distribution
transparency
Transaction
transparency
Failure
transparency
Performance
transparency
Heterogeneity
transparency
43
44. Distribution Transparency
Allows management of physically dispersed database
as if centralized
Levels
Fragmentation transparency
Location transparency
Local mapping transparency
44
45. Distribution Transparency
Unique fragment: Each row is unique, regardless of
the fragment in which it is located
Supported by distributed data dictionary (DDD) or
distributed data catalog (DDC)
DDC contains the description of the entire database as
seen by the database administrator
Distributed global schema: Common database
schema to translate user requests into subqueries
45
46. Transaction Transparency
Ensures database transactions will maintain
distributed database’s integrity and consistency
Ensures transaction completed only when all database
sites involved complete their part
Distributed database systems require complex
mechanisms to manage transactions
46
47. Distributed Requests and Distributed
Transactions
• Single SQL statement accesses data processed by a single remote
database processor
Remote request
• Accesses data at single remote site composed of several requests
Remote transaction
• Requests data from several different remote sites on network
Distributed transaction
• Single SQL statement references data at several DP sites
Distributed request
47
48. Distributed Concurrency Control
Concurrency control is important in distributed
databases environment
Due to multi-site multiple-process operations that
create inconsistencies and deadlocked transactions
48
50. Two-Phase Commit Protocol (2PC)
Guarantees if a portion of a transaction operation
cannot be committed, all changes made at the other
sites will be undone
To maintain a consistent database state
Requires that each DP’s transaction log entry be
written before database fragment is updated
DO-UNDO-REDO protocol: Roll transactions back
and forward with the help of the system’s transaction
log entries
50
51. Two-Phase Commit Protocol (2PC)
Write-ahead protocol: Forces the log entry to be
written to permanent storage before actual operation
takes place
Defines operations between coordinator and
subordinates
Phases of implementation
Preparation
The final COMMIT
51
52. Performance and Failure Transparency
Performance transparency: Allows a DDBMS to
perform as if it were a centralized database
Failure transparency: Ensures the system will
operate in case of network failure
Considerations for resolving requests in a distributed
data environment
Data distribution
Data replication
Replica transparency: DDBMS’s ability to hide multiple
copies of data from the user
52
53. Performance and Failure Transparency
Network and node availability
Network latency: delay imposed by the amount of time
required for a data packet to make a round trip
Network partitioning: delay imposed when nodes
become suddenly unavailable due to a network failure
53
54. Distributed Database Design
• How to partition database into fragments
Data fragmentation
• Which fragments to replicate
Data replication
• Where to locate those fragments and replicas
Data allocation
54
55. Data Fragmentation
Breaks single object into many segments
Information is stored in distributed data catalog (DDC)
Strategies
Horizontal fragmentation: Division of a relation into
subsets (fragments) of tuples (rows)
Vertical fragmentation: Division of a relation into
attribute (column) subsets
Mixed fragmentation: Combination of horizontal and
vertical strategies
55
56. Data Replication
Data copies stored at multiple sites served by a
computer network
Mutual consistency rule: Replicated data fragments
should be identical
Styles of replication
Push replication
Pull replication
Helps restore lost data
56
Supported Databases: IBM Db2, Microsoft SQL Server, Mango DB, Oracle,
PostGreSQL
57. Types of Data Replication [1/3]
Transactional Replication – In Transactional
replication users receive full initial copies of the
database and then receive updates as data changes.
Data is copied in real time from the publisher to the
receiving database(subscriber) in the same order as
they occur with the publisher therefore in this type of
replication, transactional consistency is
guaranteed.
Transactional replication is typically used in server-to-
server environments.
It does not simply copy the data changes, but rather
consistently and accurately replicates each change. 57
58. Types of Data Replication [2/3]
Snapshot Replication – Snapshot replication
distributes data exactly as it appears at a specific
moment in time does not monitor for updates to the
data. The entire snapshot is generated and sent to
Users. Snapshot replication is generally used when
data changes are infrequent.
It is bit slower than transactional because on each
attempt it moves multiple records from one end to the
other end.
Snapshot replication is a good way to perform initial
synchronization between the publisher and the subscriber.
58
59. Types of Data Replication [3/3]
Merge Replication – Data from two or more
databases is combined into a single database.
Merge replication is the most complex type of
replication because it allows both publisher and
subscriber to independently make changes to the
database.
Merge replication is typically used in server-to-client
environments. It allows changes to be sent from one
publisher to multiple subscribers.
59
63. Data Replication Scenarios
• Stores multiple copies of each database fragment at
multiple sites
Fully replicated database
• Stores multiple copies of some database fragments at
multiple sites
Partially replicated database
• Stores each database fragment at a single site
Unreplicated database
63
64. Data Allocation Strategies
• Entire database stored at one site
Centralized data allocation
• Database is divided into two or more disjoined
fragments and stored at two or more sites
Partitioned data allocation
• Copies of one or more database fragments are stored
at several sites
Replicated data allocation
64
65. The CAP Theorem
CAP stands for:
Consistency: Every read receives the most recent write or an
error
Availability: Every request receives a (non-error) response,
without the guarantee that it contains the most recent write
Partition tolerance: The system continues to operate despite
an arbitrary number of messages being dropped (or delayed)
by the network between nodes
Basically available, soft state, eventually consistent
(BASE)
Data changes are not immediate but propagate slowly
through the system until all replicas are consistent 65
67. Key Assumptions of Hadoop
Distributed File System
High volume
Write-once,
read-many
Streaming access
Move
computations to
the data
Fault tolerance
67
73. C. J. Date’s Twelve Commandments
for Distributed Databases
Local site independence
Central site independence
Failure independence
Location transparency
Fragmentation transparency
Replication transparency
73
74. C. J. Date’s Twelve Commandments
for Distributed Databases
Distributed query processing
Distributed transaction processing
Hardware independence
Operating system independence
Network independence
Database independence
74
Editor's Notes
A homogeneous distributed database has identical software and hardware running all databases instances, and may appear through a single interface as if it were a single database.
A heterogeneous distributed database may have different hardware, operating systems, database management systems, and even data models for different databases.
Consider fragmenting your tables if improving at least one of the following is your goal:
Single-user response time
Concurrency
Availability
Backup-and-restore characteristics
Loading of data
In an expression-based distribution scheme, each fragment expression in a rule specifies a storage space. Each fragment expression in the rule isolates data and aids the database server in searching for rows.
SELECT * FROM hr.employees@db1.example.com;
CREATE PUBLIC DATABASE LINK sales.division3.example.com USING 'sales1';
‘foo’ is a table alias/identifier for the derived query
Distributed database management system (DDBMS): for example, the data input/output (I/O), data selection, and data validation might be performed on one computer, and a report based on that data might be created on another computer.
A homogeneous distributed database has identical software and hardware running all databases instances, and may appear through a single interface as if it were a single database.
Autonomous − Each database is independent that functions on its own. They are integrated by a controlling application and use message passing to share data updates.
Non-autonomous − Data is distributed across the homogeneous nodes and a central or master DBMS co-ordinates data updates across the sites.
-----------
A heterogeneous distributed database may have different hardware, operating systems, database management systems, and even data models for different databases.Federated − The heterogeneous database systems are independent in nature and integrated together so that they function as a single database system.
Un-federated − The database systems employ a central coordinating module through which the databases are accessed.
schema is a collection of logical structures of data, or schema objects. A schema is owned by a database user and has the same name as that user. Each user owns a single schema.
schema_object is a logical data structure like a table, index, view, synonym, procedure, package, or a database link.
global_database_name is the name that uniquely identifies a remote database. This name must be the same as the concatenation of the remote database initialization parameters DB_NAME and DB_DOMAIN, unless the parameter GLOBAL_NAMES is set to FALSE, in which case any name is acceptable.
A procedure (often called a stored procedure) is a subroutine like a subprogram in a regular computing language, stored in database.
SQL Server triggers are special stored procedures that are executed automatically in response to the database object, database, and server events.
Data Replication is the process of storing data in more than one site or node. It is useful in improving the availability of data. It is simply copying data from a database from one server to another server so that all the users can share the same data without any inconsistency. The result is a distributed database in which users can access data relevant to their tasks without interfering with the work of others.
Atomicity − This property states that a transaction must be treated as an atomic unit, that is, either all of its operations are executed or none. There must be no state in a database where a transaction is left partially completed. States should be defined either before the execution of the transaction or after the execution/abortion/failure of the transaction.
Consistency − The database must remain in a consistent state after any transaction. No transaction should have any adverse effect on the data residing in the database. If the database was in a consistent state before the execution of a transaction, it must remain consistent after the execution of the transaction as well.
Durability − The database should be durable enough to hold all its latest updates even if the system fails or restarts. If a transaction updates a chunk of data in a database and commits, then the database will hold the modified data. If a transaction commits but the system fails before the data could be written on to the disk, then that data will be updated once the system springs back into action.
Isolation − In a database system where more than one transaction are being executed simultaneously and in parallel, the property of isolation states that all the transactions will be carried out and executed as if it is the only transaction in the system. No transaction will affect the existence of any other transaction.
At Uber, HDFS was designed as a scalable distributed file system to support thousands of nodes within a single cluster. With enough hardware, scaling to over 100 petabytes of raw storage capacity in one cluster can be easily—and quickly—achieved.