Distributed database management systems

 Chapter 12
 Distributed Database Management Systems
ISBN-13: 978-1337627900
ISBN-10: 1337627909
Buy Book Amazon.com URL
Modified by:
Usman Tariq, PhD
Associate Professor, PSAU
Office ☎ 00966 11 588 8386

Learning Objectives
 In this chapter, the student will learn:
 About distributed database management systems
(DDBMSs) and their components
 How database implementation is affected by different
levels of data and process distribution
 How transactions are managed in a distributed database
environment
2

Learning Objectives
 In this chapter, the student will learn:
 How distributed database design draws on data
partitioning and replication to balance performance,
scalability, and availability
 About the trade-offs of implementing a distributed data
system
3

 Distributed database
 A set of databases in a distributed system that can
appear to applications as a single data source.
4
Hierarchical Arrangement of
Networked Databases
Homogeneous
Distributed Database

Important considerations
 There are two principal approaches to store a relation
in a distributed database system:
 Replication: Database replication is the frequent
electronic copying of data from a database in one computer
or server to a database in another so that all users share the
same level of information.
 Fragmentation/Partitioning: Fragmentation is a database
server feature that allows you to control where data is stored
at the table level.
 Fragmentation enables you to define groups of rows or index keys
within a table according to some algorithm or scheme. You can use
this table to access information about your fragmented tables and
indexes.
5

Distribution scheme for table
fragmentation (1/2)
 The following example includes a FRAGMENT BY
EXPRESSION clause to create a fragmented table
with an expression-based distribution scheme:
6

Distribution scheme for table
fragmentation (2/2)
7
Here the first three fragments are stored in partitions of the dbs1 dbspace, and
the other fragments, including the remainder, are stored in named fragments of
the dbs2 dbspace. Explicit fragment names are required in this example,
because each dbspace has multiple partitions.

How to Check Index Fragmentation on
Indexes in a Database
 The following is a simple query that will list every index on every table in
your database, ordered by the percentage of index fragmentation.
8

Global Name as a Loopback Database Link
 You can use the global name of a database as a
loopback database link without explicitly creating a
database link. When the database link in a SQL
statement matches the global name of the current
database, the database link is effectively ignored.
 For example, assume the global name of a database is
db1.example.com. You can run the following SQL
statement on this database:
9

SQL statements that create database links in a local database to
the remote sales.us.americas.example_auto.com database
CREATE DATABASE LINK
sales.us.americas.example_auto.com USING
'sales_us';
 Connects To Database
 sales using net service name sales_us
 Connects As
 Connected user
 Link Type
 Private connected user
10

the remote database
CREATE DATABASE LINK foo CONNECT
TO CURRENT_USER USING 'am_sls';
 sales using service name am_sls
 Connects As
 Current global user
 Link Type
 Private current user
11

CREATE DATABASE LINK
sales.us.americas.example_auto.com
CONNECT TO SAAD IDENTIFIED BY
password USING 'sales_us';
 sales using net service name sales_us
 Connects As
 SAAD using password password
 Link Type
 Private current user
12

CREATE PUBLIC DATABASE LINK sales
CONNECT TO SULTAN IDENTIFIED BY
password USING 'rev';
 sales using net service name rev
 Connects As
 SULTAN using password password
 Link Type
 Public current user
13

CREATE SHARED PUBLIC DATABASE LINK
sales.us.americas.example_auto.com CONNECT
TO WALEED IDENTIFIED BY password
AUTHENTICATED BY USMAN IDENTIFIED BY
password1 USING 'sales';
 sales using net service name sales
 Connects As
 WALEED using password password, authenticated as
USMAN using password password1
 Link Type
 Shared public fixed user 14

 Distributed processing
 The operations that occurs when an application
distributes its tasks among different computers in a
network.
 For example, a database application typically distributes
front-end presentation tasks to client computers and
allows a back-end database server to manage shared
access to a database. Consequently, a distributed database
application processing system is more commonly referred
to as a client/server database application system.
15

Evolution Database Management
Systems
 Distributed database management system
(DDBMS): Governs storage and processing of
logically related data over interconnected
computer systems
 Data and processing functions are
distributed among several sites
 Centralized database management system
 Required that corporate data be stored in a
single central site
 Data access provided through dumb
terminals
16

Distributed DBMS - Database
Environments
17

Client - Server Architecture for DDBMS
18

Naming of Schema Objects Using
Database Links
 Oracle Database uses the global database name to
name the schema objects globally.
 Global database names are in the following form:
 schema.schema_object@global_database_name
 For example, using a database link to database
sales.division3.example.com, a user or application can
reference remote data as follows:
19
SELECT * FROM scott.emp@sales.division3.example.com;
# emp table in scott's schema
-----------------------
SELECT loc FROM
scott.dept@sales.division3.example.com;
EXPLANATORY SLIDE

 For example, assume that you connect to the local
database as user SYSTEM:
 CONNECT SYSTEM@sales1
 You then issue the following statements using
database link hq.example.com to access objects in
the scott and jane schemas on remote database hq:
 SELECT * FROM scott.emp@hq.example.com;
 INSERT INTO jane.accounts@hq.example.com (acc_no,
acc_name, balance)VALUES (5001, 'BOWER', 2000);
 UPDATE jane.accounts@hq.example.com
SET balance = balance + 500;
 DELETE FROM jane.accounts@hq.example.com
WHERE acc_name = 'BOWER';
20

Figure 12.1 - Centralized Database
Management System
21

Factors Affecting the Centralized
Database Systems
 Globalization of business operation
 Advancement of web-based services
 Rapid growth of social and network technologies
 Digitization resulting in multiple types of data
 Structured, unstructured, semi-structured data
 Time-stamped data, etc.
 Innovative business intelligence through analysis of data
22

An Oracle Distributed Database
System
 A client can
connect directly or indi
rectly to a database
server.
 A direct connection
occurs when a client
connects to a server
and accesses
information from a
database contained on
that server.
23
EXPLANATORY SLIDE

Strategies of Data Allocation
24
EXPLANATORY SLIDE

Rules for a DDBMS
 To the user, a distributed system should look exactly like a non distributed
system.
1. Local Autonomy
2. No Reliance on a Central Site
3. Continuous Operation
4. Location Independence
5. Fragmentation Independence
6. Replication Independence
7. Distributed Query Processing
8. Distributed Transaction Processing
9. Hardware Independence
10. Operating System Independence
11. Network Independence
12. Database Independence
 Last four rules are ideals. 25
EXPLANATORY SLIDE
Homogeneous Distributed Database

Hierarchical Arrangement of
Networked Databases
26
SELECT * FROM hr.employees@db1.example.com;
EXPLANATORY SLIDE

Creation of Database Links
27
EXPLANATORY SLIDE

Remote SQL Statements
 A remote update statement is an update that
modifies data in one or more tables, all of which are
located at the same remote node.
 For example, the following query updates the dept table
in the scott schema of the remote sales database:
28
EXPLANATORY SLIDE

Distributed SQL Statements
 A distributed SQL statement either queries or
modifies data on two or more nodes.
 A distributed query statement retrieves information
from two or more nodes.
 For example, the following query accesses data from the
local database as well as the remote sales database:
29
EXPLANATORY SLIDE

Distributed UPDATE statement
 A distributed update statement modifies data on two or
more nodes. A distributed update is possible using a PL/SQL
sub-program unit such as a procedure or trigger that includes
two or more remote updates that access data on different
nodes.
 For example, the following PL/SQL program unit updates
tables on the local database and the remote sales database:
30
EXPLANATORY SLIDE

Factors That Aided DDBMS to Cope
With Technological Advancement
Acceptance of Internet as a platform for business
Mobile wireless revolution
Usage of application as a service
Focus on mobile business intelligence
31

Desirability of Distributed DBMS
Over Centralized DBMS
Performance
degradation
High costs
Reliability
problems
Scalability
problems
Organizational
rigidity
32

Advantages and Disadvantages of
DDBMS
Advantages
• Data are located near
greatest demand site
• Faster data access and
processing
• Growth facilitation
• Improved communications
• Reduced operating costs
• User-friendly interface
• Less danger of a single-
point failure
• Processor independence
Disadvantages
• Complexity of management
and control
• Technological difficulty
• Security
• Lack of standards
• Increased storage and
infrastructure requirements
• Increased training cost
• Costs incurred due to the
requirement of duplicated
infrastructure
33

Characteristics of Distributed
Management Systems
Application
interface
Validation Transformation
Query
optimization
Mapping I/O interface Formatting Security
Backup and
recovery
DB
administration
Concurrency
control
Transaction
management
34

Functions of Distributed DBMS
 Receives the request of an application
 Validates, analyzes, and decomposes the request
 Maps the request
 Decomposes request into several I/O operations
 Searches and validates data
 Ensures consistency, security, and integrity
 Validates data for specific conditions
 Presents data in required format
35

Figure 12.4 - A Fully Distributed Database
Management System
36

DDBMS Components
 Computer workstations or remote devices
 Network hardware and software components
 Communications media
• Transaction processor (TP): Software component of a
system that requests data
 Known as transaction manager (TM) or application
processor (AP)
 Data processor (DP) or data manager (DM)
 Software component on a system that stores and
retrieves data from its location
37

Single-Site Processing, Single-Site
Data (SPSD)
 Processing is done on a single host computer
 Data stored on host computer’s local disk
 Processing restricted on end user’s side
 DBMS is accessed by dumb terminals
38

Multiple-Site Processing, Single-Site
Data (MPSD)
 Multiple processes run on different computers
sharing a single data repository
 Require network file server running
conventional applications
 Accessed through LAN
 Client/server architecture
 Reduces network traffic
 Processing is distributed
 Supports data at multiple sites
39

Figure 12.7 - Multiple-Site Processing,
Single-Site Data
40

Multiple-Site Processing, Single-Site
Data (MPSD)
 Fully distributed database management system
 Support multiple data processors and transaction
processors at multiple sites
 Classification of DDBMS depending on the level of
support for various types of databases
 Homogeneous: Integrate multiple instances of same
DBMS over a network
 Heterogeneous: Integrate different types of DBMSs (e.g.
Object-oriented Databases, Document Databases, Relational Databases, etc.)
 Fully heterogeneous: Support different DBMSs, each
supporting different data model (e.g. Entity-Relationship Model,
network model, etc. ) 41

Restrictions of DDBMS
 Remote access is provided on a read-only basis
 Restrictions on the number of remote tables that may
be accessed in a single transaction
 Restrictions on the number of distinct databases that
may be accessed
 Restrictions on the database model that may be
accessed
42

Distributed Database Transparency
Features (cont.)
Distribution
transparency
Transaction
transparency
Failure
transparency
Performance
transparency
Heterogeneity
transparency
43

Distribution Transparency
 Allows management of physically dispersed database
as if centralized
 Levels
 Fragmentation transparency
 Location transparency
 Local mapping transparency
44

Distribution Transparency
 Unique fragment: Each row is unique, regardless of
the fragment in which it is located
 Supported by distributed data dictionary (DDD) or
distributed data catalog (DDC)
 DDC contains the description of the entire database as
seen by the database administrator
 Distributed global schema: Common database
schema to translate user requests into subqueries
45

Transaction Transparency
 Ensures database transactions will maintain
distributed database’s integrity and consistency
 Ensures transaction completed only when all database
sites involved complete their part
 Distributed database systems require complex
mechanisms to manage transactions
46

Distributed Requests and Distributed
Transactions
• Single SQL statement accesses data processed by a single remote
database processor
Remote request
• Accesses data at single remote site composed of several requests
Remote transaction
• Requests data from several different remote sites on network
Distributed transaction
• Single SQL statement references data at several DP sites
Distributed request
47

Distributed Concurrency Control
 Concurrency control is important in distributed
databases environment
 Due to multi-site multiple-process operations that
create inconsistencies and deadlocked transactions
48

Figure 12.14 - The Effect of Premature
COMMIT
49

Two-Phase Commit Protocol (2PC)
 Guarantees if a portion of a transaction operation
cannot be committed, all changes made at the other
sites will be undone
 To maintain a consistent database state
 Requires that each DP’s transaction log entry be
written before database fragment is updated
 DO-UNDO-REDO protocol: Roll transactions back
and forward with the help of the system’s transaction
log entries
50

Two-Phase Commit Protocol (2PC)
 Write-ahead protocol: Forces the log entry to be
written to permanent storage before actual operation
takes place
 Defines operations between coordinator and
subordinates
 Phases of implementation
 Preparation
 The final COMMIT
51

Performance and Failure Transparency
 Performance transparency: Allows a DDBMS to
perform as if it were a centralized database
 Failure transparency: Ensures the system will
operate in case of network failure
 Considerations for resolving requests in a distributed
data environment
 Data distribution
 Data replication
 Replica transparency: DDBMS’s ability to hide multiple
copies of data from the user
52

Performance and Failure Transparency
 Network and node availability
 Network latency: delay imposed by the amount of time
required for a data packet to make a round trip
 Network partitioning: delay imposed when nodes
become suddenly unavailable due to a network failure
53

Distributed Database Design
• How to partition database into fragments
Data fragmentation
• Which fragments to replicate
Data replication
• Where to locate those fragments and replicas
Data allocation
54

Data Fragmentation
 Breaks single object into many segments
 Information is stored in distributed data catalog (DDC)
 Strategies
 Horizontal fragmentation: Division of a relation into
subsets (fragments) of tuples (rows)
 Vertical fragmentation: Division of a relation into
attribute (column) subsets
 Mixed fragmentation: Combination of horizontal and
vertical strategies
55

Data Replication
 Data copies stored at multiple sites served by a
computer network
 Mutual consistency rule: Replicated data fragments
should be identical
 Styles of replication
 Push replication
 Pull replication
 Helps restore lost data
56
Supported Databases: IBM Db2, Microsoft SQL Server, Mango DB, Oracle,
PostGreSQL

Types of Data Replication [1/3]
 Transactional Replication – In Transactional
replication users receive full initial copies of the
database and then receive updates as data changes.
Data is copied in real time from the publisher to the
receiving database(subscriber) in the same order as
they occur with the publisher therefore in this type of
replication, transactional consistency is
guaranteed.
 Transactional replication is typically used in server-to-
server environments.
 It does not simply copy the data changes, but rather
consistently and accurately replicates each change. 57

 Snapshot Replication – Snapshot replication
distributes data exactly as it appears at a specific
moment in time does not monitor for updates to the
data. The entire snapshot is generated and sent to
Users. Snapshot replication is generally used when
data changes are infrequent.
 It is bit slower than transactional because on each
attempt it moves multiple records from one end to the
other end.
 Snapshot replication is a good way to perform initial
synchronization between the publisher and the subscriber.
58

 Merge Replication – Data from two or more
databases is combined into a single database.
 Merge replication is the most complex type of
replication because it allows both publisher and
subscriber to independently make changes to the
database.
 Merge replication is typically used in server-to-client
environments. It allows changes to be sent from one
publisher to multiple subscribers.
59

Full Replication
60
EXPLANATORY SLIDE

Two-Database Replication Environment
with Local Capture Processes [1/2]
61
EXPLANATORY SLIDE

Two-Database Replication Environment
with Local Capture Processes [2/2]
62
EXPLANATORY SLIDE

Data Replication Scenarios
• Stores multiple copies of each database fragment at
multiple sites
Fully replicated database
• Stores multiple copies of some database fragments at
multiple sites
Partially replicated database
• Stores each database fragment at a single site
Unreplicated database
63

Data Allocation Strategies
• Entire database stored at one site
Centralized data allocation
• Database is divided into two or more disjoined
fragments and stored at two or more sites
Partitioned data allocation
• Copies of one or more database fragments are stored
at several sites
Replicated data allocation
64

The CAP Theorem
 CAP stands for:
 Consistency: Every read receives the most recent write or an
error
 Availability: Every request receives a (non-error) response,
without the guarantee that it contains the most recent write
 Partition tolerance: The system continues to operate despite
an arbitrary number of messages being dropped (or delayed)
by the network between nodes
 Basically available, soft state, eventually consistent
(BASE)
 Data changes are not immediate but propagate slowly
through the system until all replicas are consistent 65

Key Assumptions of Hadoop
Distributed File System
High volume
Write-once,
read-many
Streaming access
Move
computations to
the data
Fault tolerance
67

Uber’s Apache Hadoop Distributed
File System
68
EXPLANATORY SLIDE

Oracle Engineered Systems for Big Data
69
EXPLANATORY SLIDE

Oracle Big Data Appliance Software
Overview"
70
EXPLANATORY SLIDE

Using R to Process Data in Hive Tables
71
EXPLANATORY SLIDE

Figure 12.20 - Hadoop Distributed File
System (HDFS)
72

C. J. Date’s Twelve Commandments
for Distributed Databases
 Local site independence
 Central site independence
 Failure independence
 Location transparency
 Fragmentation transparency
 Replication transparency
73

C. J. Date’s Twelve Commandments
for Distributed Databases
 Distributed query processing
 Distributed transaction processing
 Hardware independence
 Operating system independence
 Network independence
 Database independence
74

Distributed database management systems

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Distributed database management systems

Similar to Distributed database management systems (20)

More from Usman Tariq

More from Usman Tariq (19)

Recently uploaded

Recently uploaded (20)

Distributed database management systems

Editor's Notes