SlideShare a Scribd company logo
Chapter 12
Distributed Database Management
Systems
2
Learning Objectives
• After completing this chapter, you will be able to:
• Explain the purpose and function of distributed database management systems
(DDBMSs)
• Summarize the advantages and disadvantages of DDBMSs
• Describe the characteristics and components of DDBMSs
• Explain how database implementation is affected by different levels of data and
process distribution
• Understand how transactions are managed in a distributed database environment
• Describe how distributed database design balances performance, scalability, and
availability
• Explain the trade-offs of implementing a distributed data system
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
3
The Evolution of Distributed Database Management Systems (1 of 5)
• A distributed database management system (DDBMS)
• Governs storage and processing of logically related data over interconnected
computer systems
• Distributes data and processing functions among several sites
• Centralized database management system
• Required corporate data be stored in a single central site
• Provided data access through dumb terminals
• Filled structured information needs of corporations; fell short when quickly moving
events required faster response times and equally quick access to information
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
4
The Evolution of Distributed Database Management Systems (2 of 5)
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
5
The Evolution of Distributed Database Management Systems (3 of 5)
• Changes that affected the nature of systems
• Globalization of business operation
• Increased market needs for an on-demand transaction style, based on web-based
services
• Rapid social and technological changes fueled by low-cost smart mobile devices
• Converging data realms in the digital world
• Advent of social media to reach new customers and markets
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
6
The Evolution of Distributed Database Management Systems (4 of 5)
• Database requirements in a dynamic business environment
• Rapid ad hoc data access
- Crucial in the quick-response decision-making environment
• Distributed data access
- Needed to support geographically dispersed business units
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
7
• Factors that influenced DDBMS
• Acceptance of Internet as a platform for business
• Mobile wireless revolution
• Growth of use of “application as a service”
• Focus on mobile business intelligence
• Emphasis on Big Data analytics
• Potential centralized DBMS problems
• Performance degradation
• High costs
• Reliability problems
• Scalability problems
• Organizational rigidity
The Evolution of Distributed Database Management Systems (5 of 5)
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
8
DDBMS Advantages and Disadvantages
Table 12.1: Distributed DBMS Advantages and Disadvantages
Advantages Disadvantages
Data is located near the site of greatest demand. The data in a distributed
database system is dispersed to match business requirements.
Complexity of management and control. Applications must recognize data
location and be able to stitch together data from various sites. Database
administrators must have the ability to coordinate database activities to prevent
database degradation due to data anomalies.
Faster data access. End users often work with only the nearest stored subset of
the data.
Technological difficulty. Data integrity, transaction management, concurrency
control, security, backup, recovery, and query optimization must all be addressed
and resolved.
Faster data processing. A distributed database system spreads out the system’s
workload by processing data at several sites.
Security. Probability of security lapses increases when data is located at multiple
sites. Responsibility of data management will be shared by different people at
several sites.
Growth facilitation. New sites can be added to the network without affecting the
operations of other sites.
Lack of standards. No standard communication protocols at the database level.
Different database vendors employ different and often incompatible techniques
to manage the distribution of data and processing in a DDBMS environment.
Improved communications. Because local sites are smaller and located closer to
customers, local sites foster better communication among departments and
between customers and company staff.
Increased storage and infrastructure requirements. Multiple copies of data are
required at different sites, thus requiring additional storage space.
Reduced operating costs. It’s more cost-effective to add nodes to a network than
to update a mainframe system. Development work is done more cheaply and
quickly on low-cost PCs and laptops than on mainframes.
Increased training cost. Training costs are generally higher in a distributed model
than they would be in a centralized model, sometimes even to the extent of
offsetting operational and hardware savings.
User-friendly interface. Client devices are usually equipped with an easy-to-use
graphical user interface (GUI). The G UI simplifies training and use for end users.
Higher costs. Distributed databases require duplicated infrastructure
to operate, such as physical location, environment, personnel, software, and
licensing.
Less danger of a single-point failure. When one of the computers fails, the
workload is picked up by other workstations. Data is also distributed at multiple
sites.
Processor independence. The end user can access any available copy of the data,
and an end user’s request is processed by any processor at the data location.
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
9
Distributed Processing and Distributed Databases (1 of 3)
• Distributed processing: database’s logical processing is shared among two or
more physically independent sites via network
• Distributed database: stores logically related database over two or more physically
independent sites via a computer network
• Database fragments: database composed of many parts in distributed database
system
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
10
Distributed Processing and Distributed Databases (2 of 3)
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
11
Distributed Processing and Distributed Databases (3 of 3)
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
12
• A DBMS must have several functions to be classified as distributed
• Application interface
• Validation
• Transformation
• Query optimization
• Mapping
• I/O interface
• Formatting
• Security
• Backup and recovery
• DB administration
• Concurrency control
• Transaction management
Characteristics of Distributed Management Systems (1 of 3)
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
13
Characteristics of Distributed Management Systems (2 of 3)
• Functions of fully distributed DBMS
• Receive the request of an application or end user
• Validate, analyze, and decompose the request
• Map request’s logical-to-physical data components
• Decompose request into several I/O operations
• Search, locate, read and validate data
• Ensure database consistency, security, and integrity
• Validate data for conditions specified by request
• Present data in required format
• Handle all necessary functions transparently to user
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
14
Characteristics of Distributed Management Systems (3 of 3)
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
15
DDBMS Components
• The DDBMS must include at least the following components:
• Computer workstations or remote devices
• Network hardware and software components
• Communications media
• Transaction processor (TP)
• Data processor (DP) or data manager (DM)
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
16
Levels of Data and Process Distribution
Table 12.2
Database Systems: Levels of Data
and Process Distribution
Advantages Single-Site Data Multiple-Site Data
Single-site process Host DBMS Not applicable
(Requires multiple processes)
Multiple-site process File server
Client/server DBMS (LAN DBMS)
Fully distributed
Client/server DDBMS
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
17
Single-Site Processing, Single-Site Data (SPSD) (1 of 2)
• Characteristics
• Processing is done on a single host computer
• Data stored on host computer’s local disk
• Processing cannot be done on end user’s side
• DBMS is accessed by connected terminals
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
18
Single-Site Processing, Single-Site Data (SPSD) (2 of 2)
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
19
Multiple-Site Processing, Single-Site Data (MPSD) (1 of 2)
• Multiple processes run on different computers sharing a single data repository
• Typically requires network file server running conventional applications
- Accessed through LAN
• Client/server architecture
• Reduces network traffic
• Distributes processing
• Supports data at multiple sites
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
20
Multiple-Site Processing, Single-Site Data (MPSD) (2 of 2)
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
21
Multiple-Site Processing, Multiple-Site Data (MPMD) (1 of 2)
• Fully distributed database management system
• Support multiple data processors and transaction processors at multiple sites
• Classifications
• Homogeneous: integrate multiple instances of same DBMS over a network
• Heterogeneous: integrate different types of DBMSs over a network
• Fully heterogeneous: support different DBMSs, each supporting different data model
running under different computer systems
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
22
Multiple-Site Processing, Multiple-Site Data (MPMD) (2 of 2)
• Restrictions of DDBMS
• Remote access is provided on a read-only basis
• Restrictions on the number of remote tables that may be accessed in a single
transaction
• Restrictions on the number of distinct databases that may be accessed
• Restrictions on the database model that may be accessed
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
23
Distributed Database Transparency Features
• Minimum desirable DDBMS transparency features
• Distribution transparency
• Transaction transparency
• Failure transparency
• Performance transparency
• Heterogeneity transparency
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
24
Distribution Transparency
• Allows management of physically dispersed database as if centralized
• Levels: fragmentation, location, and local mapping
- Unique fragment: each row is unique, regardless of the fragment in which it is located
• Supported by distributed data dictionary (DDD) or distributed data catalog (DDC)
- DDC contains the description of the entire database as seen by the database administrator
• Distributed global schema: common database schema to translate user requests into
subqueries
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
25
Transaction Transparency
• Transaction transparency: DDBMS property that ensures database transactions
will maintain the distributed database’s integrity and consistency
• Ensures transaction completed only when all database sites involved complete their
part
• Distributed database systems require complex mechanisms to manage
transactions
• Ensure the database’s consistency and integrity
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
26
Distributed Requests and Distributed Transactions
• Remote request
• Single SQL statement accesses data processed by a single remote database processor
• Remote transaction
• Accesses data at single remote site composed of several requests
• Distributed transaction
• Requests data from several different remote sites on network
• Distributed request
• Single SQL statement references data at several DP sites
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
27
Distributed Concurrency Control (1 of 2)
• Concurrency control is especially important in distributed databases
environment
• Multi-site, multiple-process operations are more likely to create inconsistencies and
deadlocked transactions
• Solution to inconsistent database is a two-phase commit protocol
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
28
Distributed Concurrency Control (2 of 2)
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
29
Two-Phase Commit Protocol (2PC)
• Guarantees if a portion of a transaction operation cannot be committed, all
changes made at the other sites will be undone
• Maintains a consistent database state
• Requires each DP’s transaction log entry be written before database fragment is
updated
• DO-UNDO-REDO protocol: roll transactions back and forward with the help of the
system’s transaction log entries
• Write-ahead protocol: forces the log entry to be written to permanent storage
before actual operation takes place
• Two-phase commit protocol defines operations between coordinator and
subordinates
• Phases of implementation
- Preparation
- Final COMMIT
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
30
Performance and Failure Transparency (1 of 2)
• Performance transparency allows a DDBMS to perform as if it were a
centralized database
• Failure transparency ensures the system will operate in case of network failure
• Objective of query optimization is to minimize total costs
• Access time (I/O) cost involved in accessing data from multiple remote sites
• Communication costs associated with data transmission
• CPU time cost associated with the processing overhead
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
31
Performance and Failure Transparency (2 of 2)
• Considerations for resolving data requests in a distributed data environment
• Data distribution and data replication
- Replica transparency: DDBMS’s ability to hide multiple copies of data from the user
• Network and node availability
- Network latency: delay imposed by the amount of time required for a data packet to make a
round trip
- Network partitioning: delay imposed when nodes become suddenly unavailable due to a
network failure
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
32
Distributed Database Design
• Data fragmentation
• How to partition database into fragments
• Data replication
• Which fragments to replicate
• Data allocation
• Where to locate those fragments and replicas
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
33
Data Fragmentation
• Breaks a single object into two or more segments
• Information is stored in distributed data catalog (DDC)
• Strategies
• Horizontal fragmentation: division of a relation into subsets (fragments) of tuples
(rows)
• Vertical fragmentation: division of a relation into attribute (column) subsets
• Mixed fragmentation: combination of horizontal and vertical strategies
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
34
Data Replication (1 of 2)
• Storage of data copies at multiple sites served by a computer network
• Mutual consistency rule requires all copies of data fragments be identical
• Styles of replication
• Push replication focuses on maintaining data consistency
• Pull replication focuses on maintaining data availability and allows for temporary data
inconsistencies
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
35
Data Replication (2 of 2)
• Data replication scenarios
• Fully replicated database: stores multiple copies of each database fragment at
multiple sites
• Partially replicated database: stores multiple copies of some database fragments at
multiple sites
• Unreplicated database: stores each database fragment at a single site
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
36
Data Allocation
• Data allocation strategies
• Centralized data allocation: entire database stored at one site
• Partitioned data allocation: database is divided into two or more disjoined fragments
and stored at two or more sites
• Replicated data allocation: copies of one or more database fragments are stored at
several sites
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
37
The CAP Theorem (1 of 2)
• CAP stands for:
• Consistency
• Availability
• Partition tolerance
• Trade-off between consistency and availability generated in a new system in
which data is basically available, soft state, eventually consistent (BASE)
• Data changes are not immediate but propagate slowly through the system until all
replicas are consistent
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
38
The CAP Theorem (2 of 2)
Table 12.8
Distributed
Database
Spectrum
DBMS Type Consistency Availability Partition
Tolerance
Transaction
Model
Trade-Off
Centralized DBMS High High N/A ACID No distributed data processing
Relational DBMS High Relaxed High ACID (2PC) Sacrifices availability to ensure
consistency and isolation
NoSQL DDBMS Relaxed High High BASE Sacrifices consistency to ensure
availability
NewSQL DDBMS High High Relaxed ACID Sacrifices partition tolerance
to ensure transaction consistency
and availability
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
39
C. J. Date’s 12 Commandments for Distributed Databases
Table 12.9 C. J. Date ’s 12 Commandments for
Distributed Databases
Rule
Number
Rule Name Rule Explanation
1 Local-site independence Each local site can act as an independent, autonomous, centralized DBMS.
Each site is responsible for security, concurrency control, backup, and recovery.
2 Central-site independence Each local site can act as an independent, autonomous, centralized DBMS.
Each site is responsible for security, concurrency control, backup, and recovery.
3 Failure independence The system is not affected by node failures. The system is in continuous operation even in the case of a
node failure or an expansion of the network.
4 Failure Independence The user does not need to know the location of data to retrieve that data.
5 Fragmentation transparency Data fragmentation is transparent to the user, who sees only one logical database. The user does not
need to know the name of the database fragments to retrieve them.
6 Replication transparency The user sees only one logical database. The DDBMS transparently selects the database fragment to
access. To the user, the DDBMS manages all fragments transparently.
7 Distributed query processing A distributed query may be executed at several different DP sites. Query optimization is performed
transparently by the DDBMS.
8 Distributed transaction processing A transaction may update data at several different sites, and the transaction is executed transparently.
9 Hardware independence The system must run on any hardware platform.
10 Operating system independence The system must run on any operating system platform.
11 Network independence The system must run on any network platform.
12 Database independence The system must support any vendor’s database product.
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
40
Summary (1 of 2)
• A distributed database stores logically related data in two or more physically
independent sites connected via a computer network
• Distributed processing is the division of logical database processing among two
or more network nodes
• The main components of a DDBMS are the transaction processor (TP) and the
data processor (DP)
• Current database systems can be classified by the extent to which they support
processing and data distribution
• A homogeneous distributed database system integrates only one particular
type of DBMS over a computer network
• DDBMS characteristics are best described as a set of transparencies:
distribution, transaction, performance, failure, and heterogeneity
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.
41
Summary (2 of 2)
• A transaction is formed by one or more database requests. An undistributed
transaction updates or requests data from a single site
• Distributed concurrency control is required in a network of distributed
databases
• A distributed DBMS evaluates every data request to find the optimum access
path in a distributed database
• The design of a distributed database must consider the fragmentation and
replication of data
• A database can be replicated over several different sites on a computer network
• The CAP theorem states that a highly distributed data system has some
desirable properties of consistency, availability, and partition tolerance
© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
for classroom use.

More Related Content

Similar to ch12text.pdf

Unit 2 rdbms study_material
Unit 2  rdbms study_materialUnit 2  rdbms study_material
Unit 2 rdbms study_material
gayaramesh
 
Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User Information
Denodo
 
Data Mesh
Data MeshData Mesh
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Cloud Computing.pptx
Cloud Computing.pptxCloud Computing.pptx
Cloud Computing.pptx
raghurajsingh51
 
INF3703 - Chapter 14 Distributed Databases
INF3703 - Chapter 14 Distributed DatabasesINF3703 - Chapter 14 Distributed Databases
INF3703 - Chapter 14 Distributed Databases
bloeyyy
 
IRJET- A Comprehensive Review on Query Optimization for Distributed Databases
IRJET- A Comprehensive Review on Query Optimization for Distributed DatabasesIRJET- A Comprehensive Review on Query Optimization for Distributed Databases
IRJET- A Comprehensive Review on Query Optimization for Distributed Databases
IRJET Journal
 
Csld phan tan va song song
Csld phan tan va song songCsld phan tan va song song
Csld phan tan va song songLê Anh Trung
 
Cloud Computing (1).pptx
Cloud Computing (1).pptxCloud Computing (1).pptx
Cloud Computing (1).pptx
GokulNair86
 
Unit iii virtualitation
Unit iii   virtualitationUnit iii   virtualitation
Unit iii virtualitation
rajmurugaaa
 
Cloud computing(Basic).pptx
Cloud computing(Basic).pptxCloud computing(Basic).pptx
Cloud computing(Basic).pptx
nischal52
 
Cloud computing explained
Cloud computing explained Cloud computing explained
Cloud computing explained
Juan Pablo
 
Chapter 14 Business Continuity
Chapter 14 Business ContinuityChapter 14 Business Continuity
Chapter 14 Business Continuity
Dr. Ahmed Al Zaidy
 
Chap 3 infrastructure as a service(iaas)
Chap 3 infrastructure as a service(iaas)Chap 3 infrastructure as a service(iaas)
Chap 3 infrastructure as a service(iaas)
Raj Sarode
 
Caching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session ICaching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session I
VMware Tanzu
 
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure WhitepaperMicrosoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft Private Cloud
 
Deduplication on Encrypted Big Data in HDFS
Deduplication on Encrypted Big Data in HDFSDeduplication on Encrypted Big Data in HDFS
Deduplication on Encrypted Big Data in HDFS
IRJET Journal
 
Google Products.pptx
Google Products.pptxGoogle Products.pptx
Google Products.pptx
NajeebullahSadiq1
 
Distributed database management system
Distributed database management systemDistributed database management system
Distributed database management system
Vinay D. Patel
 
Data base chapter 2 | detail about the topic
Data base chapter 2 | detail about the topicData base chapter 2 | detail about the topic
Data base chapter 2 | detail about the topic
hoseg78377
 

Similar to ch12text.pdf (20)

Unit 2 rdbms study_material
Unit 2  rdbms study_materialUnit 2  rdbms study_material
Unit 2 rdbms study_material
 
Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User Information
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Cloud Computing.pptx
Cloud Computing.pptxCloud Computing.pptx
Cloud Computing.pptx
 
INF3703 - Chapter 14 Distributed Databases
INF3703 - Chapter 14 Distributed DatabasesINF3703 - Chapter 14 Distributed Databases
INF3703 - Chapter 14 Distributed Databases
 
IRJET- A Comprehensive Review on Query Optimization for Distributed Databases
IRJET- A Comprehensive Review on Query Optimization for Distributed DatabasesIRJET- A Comprehensive Review on Query Optimization for Distributed Databases
IRJET- A Comprehensive Review on Query Optimization for Distributed Databases
 
Csld phan tan va song song
Csld phan tan va song songCsld phan tan va song song
Csld phan tan va song song
 
Cloud Computing (1).pptx
Cloud Computing (1).pptxCloud Computing (1).pptx
Cloud Computing (1).pptx
 
Unit iii virtualitation
Unit iii   virtualitationUnit iii   virtualitation
Unit iii virtualitation
 
Cloud computing(Basic).pptx
Cloud computing(Basic).pptxCloud computing(Basic).pptx
Cloud computing(Basic).pptx
 
Cloud computing explained
Cloud computing explained Cloud computing explained
Cloud computing explained
 
Chapter 14 Business Continuity
Chapter 14 Business ContinuityChapter 14 Business Continuity
Chapter 14 Business Continuity
 
Chap 3 infrastructure as a service(iaas)
Chap 3 infrastructure as a service(iaas)Chap 3 infrastructure as a service(iaas)
Chap 3 infrastructure as a service(iaas)
 
Caching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session ICaching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session I
 
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure WhitepaperMicrosoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
 
Deduplication on Encrypted Big Data in HDFS
Deduplication on Encrypted Big Data in HDFSDeduplication on Encrypted Big Data in HDFS
Deduplication on Encrypted Big Data in HDFS
 
Google Products.pptx
Google Products.pptxGoogle Products.pptx
Google Products.pptx
 
Distributed database management system
Distributed database management systemDistributed database management system
Distributed database management system
 
Data base chapter 2 | detail about the topic
Data base chapter 2 | detail about the topicData base chapter 2 | detail about the topic
Data base chapter 2 | detail about the topic
 

Recently uploaded

LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
ssuser9bd3ba
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
Kamal Acharya
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 

Recently uploaded (20)

LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 

ch12text.pdf

  • 1. Chapter 12 Distributed Database Management Systems
  • 2. 2 Learning Objectives • After completing this chapter, you will be able to: • Explain the purpose and function of distributed database management systems (DDBMSs) • Summarize the advantages and disadvantages of DDBMSs • Describe the characteristics and components of DDBMSs • Explain how database implementation is affected by different levels of data and process distribution • Understand how transactions are managed in a distributed database environment • Describe how distributed database design balances performance, scalability, and availability • Explain the trade-offs of implementing a distributed data system © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 3. 3 The Evolution of Distributed Database Management Systems (1 of 5) • A distributed database management system (DDBMS) • Governs storage and processing of logically related data over interconnected computer systems • Distributes data and processing functions among several sites • Centralized database management system • Required corporate data be stored in a single central site • Provided data access through dumb terminals • Filled structured information needs of corporations; fell short when quickly moving events required faster response times and equally quick access to information © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 4. 4 The Evolution of Distributed Database Management Systems (2 of 5) © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 5. 5 The Evolution of Distributed Database Management Systems (3 of 5) • Changes that affected the nature of systems • Globalization of business operation • Increased market needs for an on-demand transaction style, based on web-based services • Rapid social and technological changes fueled by low-cost smart mobile devices • Converging data realms in the digital world • Advent of social media to reach new customers and markets © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 6. 6 The Evolution of Distributed Database Management Systems (4 of 5) • Database requirements in a dynamic business environment • Rapid ad hoc data access - Crucial in the quick-response decision-making environment • Distributed data access - Needed to support geographically dispersed business units © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 7. 7 • Factors that influenced DDBMS • Acceptance of Internet as a platform for business • Mobile wireless revolution • Growth of use of “application as a service” • Focus on mobile business intelligence • Emphasis on Big Data analytics • Potential centralized DBMS problems • Performance degradation • High costs • Reliability problems • Scalability problems • Organizational rigidity The Evolution of Distributed Database Management Systems (5 of 5) © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 8. 8 DDBMS Advantages and Disadvantages Table 12.1: Distributed DBMS Advantages and Disadvantages Advantages Disadvantages Data is located near the site of greatest demand. The data in a distributed database system is dispersed to match business requirements. Complexity of management and control. Applications must recognize data location and be able to stitch together data from various sites. Database administrators must have the ability to coordinate database activities to prevent database degradation due to data anomalies. Faster data access. End users often work with only the nearest stored subset of the data. Technological difficulty. Data integrity, transaction management, concurrency control, security, backup, recovery, and query optimization must all be addressed and resolved. Faster data processing. A distributed database system spreads out the system’s workload by processing data at several sites. Security. Probability of security lapses increases when data is located at multiple sites. Responsibility of data management will be shared by different people at several sites. Growth facilitation. New sites can be added to the network without affecting the operations of other sites. Lack of standards. No standard communication protocols at the database level. Different database vendors employ different and often incompatible techniques to manage the distribution of data and processing in a DDBMS environment. Improved communications. Because local sites are smaller and located closer to customers, local sites foster better communication among departments and between customers and company staff. Increased storage and infrastructure requirements. Multiple copies of data are required at different sites, thus requiring additional storage space. Reduced operating costs. It’s more cost-effective to add nodes to a network than to update a mainframe system. Development work is done more cheaply and quickly on low-cost PCs and laptops than on mainframes. Increased training cost. Training costs are generally higher in a distributed model than they would be in a centralized model, sometimes even to the extent of offsetting operational and hardware savings. User-friendly interface. Client devices are usually equipped with an easy-to-use graphical user interface (GUI). The G UI simplifies training and use for end users. Higher costs. Distributed databases require duplicated infrastructure to operate, such as physical location, environment, personnel, software, and licensing. Less danger of a single-point failure. When one of the computers fails, the workload is picked up by other workstations. Data is also distributed at multiple sites. Processor independence. The end user can access any available copy of the data, and an end user’s request is processed by any processor at the data location. © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 9. 9 Distributed Processing and Distributed Databases (1 of 3) • Distributed processing: database’s logical processing is shared among two or more physically independent sites via network • Distributed database: stores logically related database over two or more physically independent sites via a computer network • Database fragments: database composed of many parts in distributed database system © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 10. 10 Distributed Processing and Distributed Databases (2 of 3) © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 11. 11 Distributed Processing and Distributed Databases (3 of 3) © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 12. 12 • A DBMS must have several functions to be classified as distributed • Application interface • Validation • Transformation • Query optimization • Mapping • I/O interface • Formatting • Security • Backup and recovery • DB administration • Concurrency control • Transaction management Characteristics of Distributed Management Systems (1 of 3) © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 13. 13 Characteristics of Distributed Management Systems (2 of 3) • Functions of fully distributed DBMS • Receive the request of an application or end user • Validate, analyze, and decompose the request • Map request’s logical-to-physical data components • Decompose request into several I/O operations • Search, locate, read and validate data • Ensure database consistency, security, and integrity • Validate data for conditions specified by request • Present data in required format • Handle all necessary functions transparently to user © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 14. 14 Characteristics of Distributed Management Systems (3 of 3) © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 15. 15 DDBMS Components • The DDBMS must include at least the following components: • Computer workstations or remote devices • Network hardware and software components • Communications media • Transaction processor (TP) • Data processor (DP) or data manager (DM) © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 16. 16 Levels of Data and Process Distribution Table 12.2 Database Systems: Levels of Data and Process Distribution Advantages Single-Site Data Multiple-Site Data Single-site process Host DBMS Not applicable (Requires multiple processes) Multiple-site process File server Client/server DBMS (LAN DBMS) Fully distributed Client/server DDBMS © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 17. 17 Single-Site Processing, Single-Site Data (SPSD) (1 of 2) • Characteristics • Processing is done on a single host computer • Data stored on host computer’s local disk • Processing cannot be done on end user’s side • DBMS is accessed by connected terminals © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 18. 18 Single-Site Processing, Single-Site Data (SPSD) (2 of 2) © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 19. 19 Multiple-Site Processing, Single-Site Data (MPSD) (1 of 2) • Multiple processes run on different computers sharing a single data repository • Typically requires network file server running conventional applications - Accessed through LAN • Client/server architecture • Reduces network traffic • Distributes processing • Supports data at multiple sites © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 20. 20 Multiple-Site Processing, Single-Site Data (MPSD) (2 of 2) © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 21. 21 Multiple-Site Processing, Multiple-Site Data (MPMD) (1 of 2) • Fully distributed database management system • Support multiple data processors and transaction processors at multiple sites • Classifications • Homogeneous: integrate multiple instances of same DBMS over a network • Heterogeneous: integrate different types of DBMSs over a network • Fully heterogeneous: support different DBMSs, each supporting different data model running under different computer systems © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 22. 22 Multiple-Site Processing, Multiple-Site Data (MPMD) (2 of 2) • Restrictions of DDBMS • Remote access is provided on a read-only basis • Restrictions on the number of remote tables that may be accessed in a single transaction • Restrictions on the number of distinct databases that may be accessed • Restrictions on the database model that may be accessed © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 23. 23 Distributed Database Transparency Features • Minimum desirable DDBMS transparency features • Distribution transparency • Transaction transparency • Failure transparency • Performance transparency • Heterogeneity transparency © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 24. 24 Distribution Transparency • Allows management of physically dispersed database as if centralized • Levels: fragmentation, location, and local mapping - Unique fragment: each row is unique, regardless of the fragment in which it is located • Supported by distributed data dictionary (DDD) or distributed data catalog (DDC) - DDC contains the description of the entire database as seen by the database administrator • Distributed global schema: common database schema to translate user requests into subqueries © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 25. 25 Transaction Transparency • Transaction transparency: DDBMS property that ensures database transactions will maintain the distributed database’s integrity and consistency • Ensures transaction completed only when all database sites involved complete their part • Distributed database systems require complex mechanisms to manage transactions • Ensure the database’s consistency and integrity © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 26. 26 Distributed Requests and Distributed Transactions • Remote request • Single SQL statement accesses data processed by a single remote database processor • Remote transaction • Accesses data at single remote site composed of several requests • Distributed transaction • Requests data from several different remote sites on network • Distributed request • Single SQL statement references data at several DP sites © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 27. 27 Distributed Concurrency Control (1 of 2) • Concurrency control is especially important in distributed databases environment • Multi-site, multiple-process operations are more likely to create inconsistencies and deadlocked transactions • Solution to inconsistent database is a two-phase commit protocol © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 28. 28 Distributed Concurrency Control (2 of 2) © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 29. 29 Two-Phase Commit Protocol (2PC) • Guarantees if a portion of a transaction operation cannot be committed, all changes made at the other sites will be undone • Maintains a consistent database state • Requires each DP’s transaction log entry be written before database fragment is updated • DO-UNDO-REDO protocol: roll transactions back and forward with the help of the system’s transaction log entries • Write-ahead protocol: forces the log entry to be written to permanent storage before actual operation takes place • Two-phase commit protocol defines operations between coordinator and subordinates • Phases of implementation - Preparation - Final COMMIT © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 30. 30 Performance and Failure Transparency (1 of 2) • Performance transparency allows a DDBMS to perform as if it were a centralized database • Failure transparency ensures the system will operate in case of network failure • Objective of query optimization is to minimize total costs • Access time (I/O) cost involved in accessing data from multiple remote sites • Communication costs associated with data transmission • CPU time cost associated with the processing overhead © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 31. 31 Performance and Failure Transparency (2 of 2) • Considerations for resolving data requests in a distributed data environment • Data distribution and data replication - Replica transparency: DDBMS’s ability to hide multiple copies of data from the user • Network and node availability - Network latency: delay imposed by the amount of time required for a data packet to make a round trip - Network partitioning: delay imposed when nodes become suddenly unavailable due to a network failure © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 32. 32 Distributed Database Design • Data fragmentation • How to partition database into fragments • Data replication • Which fragments to replicate • Data allocation • Where to locate those fragments and replicas © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 33. 33 Data Fragmentation • Breaks a single object into two or more segments • Information is stored in distributed data catalog (DDC) • Strategies • Horizontal fragmentation: division of a relation into subsets (fragments) of tuples (rows) • Vertical fragmentation: division of a relation into attribute (column) subsets • Mixed fragmentation: combination of horizontal and vertical strategies © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 34. 34 Data Replication (1 of 2) • Storage of data copies at multiple sites served by a computer network • Mutual consistency rule requires all copies of data fragments be identical • Styles of replication • Push replication focuses on maintaining data consistency • Pull replication focuses on maintaining data availability and allows for temporary data inconsistencies © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 35. 35 Data Replication (2 of 2) • Data replication scenarios • Fully replicated database: stores multiple copies of each database fragment at multiple sites • Partially replicated database: stores multiple copies of some database fragments at multiple sites • Unreplicated database: stores each database fragment at a single site © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 36. 36 Data Allocation • Data allocation strategies • Centralized data allocation: entire database stored at one site • Partitioned data allocation: database is divided into two or more disjoined fragments and stored at two or more sites • Replicated data allocation: copies of one or more database fragments are stored at several sites © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 37. 37 The CAP Theorem (1 of 2) • CAP stands for: • Consistency • Availability • Partition tolerance • Trade-off between consistency and availability generated in a new system in which data is basically available, soft state, eventually consistent (BASE) • Data changes are not immediate but propagate slowly through the system until all replicas are consistent © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 38. 38 The CAP Theorem (2 of 2) Table 12.8 Distributed Database Spectrum DBMS Type Consistency Availability Partition Tolerance Transaction Model Trade-Off Centralized DBMS High High N/A ACID No distributed data processing Relational DBMS High Relaxed High ACID (2PC) Sacrifices availability to ensure consistency and isolation NoSQL DDBMS Relaxed High High BASE Sacrifices consistency to ensure availability NewSQL DDBMS High High Relaxed ACID Sacrifices partition tolerance to ensure transaction consistency and availability © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 39. 39 C. J. Date’s 12 Commandments for Distributed Databases Table 12.9 C. J. Date ’s 12 Commandments for Distributed Databases Rule Number Rule Name Rule Explanation 1 Local-site independence Each local site can act as an independent, autonomous, centralized DBMS. Each site is responsible for security, concurrency control, backup, and recovery. 2 Central-site independence Each local site can act as an independent, autonomous, centralized DBMS. Each site is responsible for security, concurrency control, backup, and recovery. 3 Failure independence The system is not affected by node failures. The system is in continuous operation even in the case of a node failure or an expansion of the network. 4 Failure Independence The user does not need to know the location of data to retrieve that data. 5 Fragmentation transparency Data fragmentation is transparent to the user, who sees only one logical database. The user does not need to know the name of the database fragments to retrieve them. 6 Replication transparency The user sees only one logical database. The DDBMS transparently selects the database fragment to access. To the user, the DDBMS manages all fragments transparently. 7 Distributed query processing A distributed query may be executed at several different DP sites. Query optimization is performed transparently by the DDBMS. 8 Distributed transaction processing A transaction may update data at several different sites, and the transaction is executed transparently. 9 Hardware independence The system must run on any hardware platform. 10 Operating system independence The system must run on any operating system platform. 11 Network independence The system must run on any network platform. 12 Database independence The system must support any vendor’s database product. © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 40. 40 Summary (1 of 2) • A distributed database stores logically related data in two or more physically independent sites connected via a computer network • Distributed processing is the division of logical database processing among two or more network nodes • The main components of a DDBMS are the transaction processor (TP) and the data processor (DP) • Current database systems can be classified by the extent to which they support processing and data distribution • A homogeneous distributed database system integrates only one particular type of DBMS over a computer network • DDBMS characteristics are best described as a set of transparencies: distribution, transaction, performance, failure, and heterogeneity © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.
  • 41. 41 Summary (2 of 2) • A transaction is formed by one or more database requests. An undistributed transaction updates or requests data from a single site • Distributed concurrency control is required in a network of distributed databases • A distributed DBMS evaluates every data request to find the optimum access path in a distributed database • The design of a distributed database must consider the fragmentation and replication of data • A database can be replicated over several different sites on a computer network • The CAP theorem states that a highly distributed data system has some desirable properties of consistency, availability, and partition tolerance © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use.