3. Objectives
1. Explain the distributed databases
2. Differentiate Parallel and distributed database
3. Explain the difference between homogenous
and heterogenous
4. Introduction
The revolutionize data management by dispersing information across
interconnected sites, leveraging a computer network. This architecture
enhances scalability, fault tolerance, and overall performance. Two key
types are prevalent: Homogeneous Distributed Databases, where identical
databases collaborate seamlessly, ensuring consistency; and
Heterogeneous Distributed Databases, featuring diverse schemas and
software for flexible data management. Homogeneous systems prioritize
uniformity, while heterogeneous systems excel in accommodating various
data formats. Both approaches address modern data challenges, offering
tailored solutions for collaborative and efficient data processing.
5. What is Distributed Databases
1. A Distributed Database is a collection of multiple interconnected
databases, which are spread physically across various locations that
communicate via a computer network.
2. Distributed database utilize multiple nodes. They scale horizontally and
develop a distributed system. More nodes in the system provides more
computing power, offers greater availability, and resolve the single
point of failure issue.
3. Different parts of the distributed database are stored in several
physical locations, and the processing requirements are distributed
among processors on multiple database nodes.
6. Some features of distributed database
are:
1. Location Independency: Data stored at multiple sites, managed
independently by a DDBMS.
2. Distributed Query Processing: Queries handled across sites, with high-
level queries transformed for efficient execution.
3. Distributed Transaction Management: Ensures consistency via commit
protocols, concurrency control, and recovery methods.
7. Some features of database are:
4. Seamless Integration: Interconnected databases create a unified, single
logical database.
5. Network Linking: All databases connected through a network for
seamless communication.
6. Transaction Processing: Executes a set of database operations as an
atomic process in distributed databases.
8. Difference between parallel and
distributed database
Parallel Distributed
• The process are tightly coupled and
constitutes a single database system
i.e., the parallel database is a
centralized database and data
resides in a single location.
• The sites are loosely coupled and
share no physical components i.e.,
distributed database is our
geographically departed, and data
are distributed at several locations.
• Query processing and transaction is
complicated.
• Query processing and transaction is
more complicated.
• It’s not applicable. • A local and global transaction can be
transformed into distributed
database systems.
9. Difference between parallel and
distributed database
Parallel Distributed
• The data is partitioned among
various disk so that it can retrieved
faster.
• Each sites preserve a local database
system for faster processing due to
slow interconnection between sites.
• There are 3 types of architecture:
shared memory, shared disk, and
shared shared-mothing.
• Distributed Database are generally a
kind of shared-nothing architecture.
• Query optimization is more
complicated.
• Query optimization techniques may
be different at different sites and are
easy to maintain
10. Difference between parallel and
distributed database
Parallel Distributed
• Data is generally not copied. • Data is replicated at any number
of sites to improve the
performance of systems.
• Parallel Database are generally
homogeneous in nature.
• Distributed Database may be
homogeneous or heterogeneous
in nature.
• Skew is the major issue with the
increasing degree of parallelism
in parallel database.
• Blocking due to site failure and
transparency are major issues in
distributed databases.
11. 2 Types of Distributed Database
1. Homogeneous Database
2. Heterogeneous Database
1. Homogeneous Distributed Database
A homogenous distributed database is a network of identical
databases stored on multiple sites. The sites have the same operating system,
DDBMS, and data structure, making them easily manageable. Its’ properties are:
• The sites are very similar software.
• The sites use identical DBMS or DBMS from the same vendor.
• Each site is aware of all other sites and cooperates with other sites to process user
requests.
• The database is accessed through a single interface as if it is a single database.
13. Characteristics of homogeneous database
• Uniformity - All nodes utilize the same DBMS software and possess identical
data scheme.
• Data Consistency - Changes made to the database on one node are
automatically propagated to other nodes, ensuring data consistency.
• Simplicity - Homogeneous databases are relatively easier to manage as the
same DBMS software is employed throughout the system.
• 2 Types of homogeneous database
1. Autonomous - each database is independent that functions on its own. They
are integrated by a controlling application and use message passing to share
data updates.
2. Non-autonomous - data is distributed across the homogeneous nodes and a
central or master DBMS coordinates data across the sites.
14. Step to set-up a Homogeneous database
• Select a DBMS: Choose a DBMS that aligns with the distributed
system’s requirements, such as MySQL, PostgreSQL, or Oracle.
• Install the DBMS: Install the chosen DBMS on each node within the
distributed system.
• Design the Schema: Create a unified database schema to be shared
across all nodes.
• Establish Communication: Configure network connectivity between
the nodes to facilitate data replication and synchronization.
• Implement Replication: Set up replication mechanisms provided by
the DBMS to ensure changes made on one node are propagated to
others.
15. Use Case Example
• Consider a distributed e-commerce system with multiple nodes
handling customer orders, inventory management, and shipping.
Employing a homogeneous database approach, all nodes share
the same DBMS (e.g., MySQL) and adhere to a consistent schema.
When an order is placed on one node, the system automatically
synchronizes the order details and inventory updates across all
nodes, enabling real-time visibility and consistency.
16. 2 Types of Distributed Database
2. Heterogeneous Distributed Database
A heterogeneous distributed database uses different schemas, operating
systems, DDBMS, and different data models. In the case of a heterogeneous
distributed database, a particular site can be completely unaware of other sites
causing limited cooperation in processing user requests. The limitation is why
translations are required to establish communication between sites. Its’
properties are:
• Different sites use dissimilar schemas and software.
• The system may be composed of a variety of DBMSs like relational, network, hierarchical
or object oriented.
• Query processing is complex due to dissimilar schemas,
• Transaction processing is complex due to dissimilar software.
• A site may not be aware of other sites and there is limited cooperation in user requests.
18. Characteristics of Heterogeneous
database
• Flexibility: More flexible, accommodating various data formats and storage
systems.
• Nodes can employ different DBMS software, such as MySQL, MongoDB,
or Cassandra, based on their specific needs.
• Schema Mapping: Heterogeneous databases necessitate mapping between
different schemas to ensure interoperability between nodes.
• Data Transformation: Data might need to be transformed or translated
between different formats or encodings to maintain consistency.
19. Types of Heterogeneous database
1. Federated – The heterogeneous database systems are independent in nature
and integrated together so that they function as a single database system.
2. Un-federated – the database systems employ a central coordinating module
through which the database are accessed.
Step to set-up Heterogeneous database
• Identify Diverse Requirements: Understand the specific needs of each node in the
distributed system and select the appropriate DBMS software accordingly.
• Define Schema Mapping: Analyze the differences in database schemas between
nodes and establish mapping rules to convert data between schemas.
• Implement Data Transformation: Develop mechanisms or scripts to transform data
from one format to another, ensuring seamless integration.
• Establish Communication: Configure network connectivity and establish
communication channels between heterogeneous nodes.
20. Use Case Example
• Imagine a distributed system where one node utilizes a
traditional relational database (e.g., MySQL) for order
management, while another node relies on a NoSQL database
(e.g., MongoDB) for user analytics. Employing a heterogeneous
database approach enables the two nodes to leverage their
preferred DBMS technologies while facilitating data exchange
through schema mapping and data transformation
21. Summary
• Distributed databases provide several advantages, including improved
performance through parallel processing and horizontal scalability, enabling
organizations to handle larger workloads and datasets. They offer high
availability and fault tolerance by distributing data redundantly across multiple
nodes, ensuring continuous operation even in the face of node failures.
Geographical distribution allows for global data access, reducing latency for
users worldwide. Local autonomy empowers individual nodes to manage
resources independently, adapting to local requirements. Cost efficiency arises
from optimized resource utilization and the ease of adding new nodes
compared to upgrading centralized systems.
22. Summary
• Homogeneous environments, characterized by uniformity in components, offer
advantages such as consistency, interoperability, simplified management, and
easier troubleshooting due to standardized configurations. On the other hand,
heterogeneous environments provide flexibility, enabling organizations to
choose best-of-breed solutions, adapt to diverse needs, embrace technological
innovation, and avoid vendor lock-in. Heterogeneous environments support
scalability, making them well-suited for dynamic business scenarios. The choice
between homogeneous and heterogeneous environments depends on
organizational goals, requirements, and the desire for standardized operations
versus flexibility and innovation.
23. Assessment#1 Questions and Answer:
1. What is a distributed database, and how does it differ from a centralized database?
2. Explain the concept of "location independency" in the context of distributed databases.
3. What are some key features of distributed databases as mentioned in the text?
4. How does distributed query processing contribute to the efficiency of a distributed
database system?
5. Discuss the role of distributed transaction management in maintaining consistency in a
distributed database.
6. What is meant by "seamless integration" in the context of distributed databases, and
why is it important?
7. Differentiate between parallel and distributed databases, highlighting their respective
characteristics.
8. Why might data be replicated in a distributed database, and what benefits does
replication offer?
9. Explain the significance of network linking in a collection of interconnected databases.
10. What challenges are associated with query optimization in distributed databases?
24. Assessment#2 Enumeration:
1. List three key properties of a homogeneous distributed database.
2. Name two types of homogeneous distributed databases and briefly describe their
characteristics.
3. Outline the steps involved in setting up a homogeneous distributed database.
4. Provide two types of homogeneous distributed databases based on their level of
autonomy.
5. Name three characteristics of heterogeneous distributed databases.
6. List the steps to set up a heterogeneous distributed database.
25. Assessment#3 True/False:
7. True or False: In a homogeneous distributed database, each site is unaware of other sites
and operates independently.
8. True or False: Data consistency is automatically maintained in a homogeneous distributed
database without the need for synchronization.
9. True or False: Heterogeneous databases are less flexible than homogeneous databases in
accommodating various data formats.
10. True or False: Schema mapping is not required in a heterogeneous distributed database as
nodes can operate independently with different schemas.
11. True or False: Query processing is simpler in heterogeneous distributed databases due to
the use of the same DBMS across all nodes.
12. True or False: Federated heterogeneous databases function as independent systems and do
not operate as a single database.
13. True or False: Homogeneous distributed databases can use different DBMS software across
nodes, such as MySQL, MongoDB, or Cassandra.
14. True or False: In a heterogeneous distributed database, a central coordinating module is
necessary for database access in unfederated systems.
15. True or False: Flexibility is a characteristic of homogeneous databases, allowing nodes to
use different DBMS software based on specific needs.