This document provides an introduction to distributed databases. It defines a distributed database as a collection of logically related databases distributed over a computer network. It describes distributed computing and how distributed databases partition data across multiple computers. The document outlines different types of distributed database systems including homogeneous and heterogeneous. It also discusses distributed data storage techniques like replication, fragmentation, and allocation. Finally, it lists several advantages and objectives of distributed databases as well as some disadvantages.
This presentation several topics of subjects RDBMS and DBMS including Distributed Database Design,Architecture of Distributed database processing system,Data Communication concept,Concurrency control and recovery. All the topics are briefly described according to syllabus of BCA II and BCA III year subjects.
Query Processing : Query Processing Problem, Layers of Query Processing Query Processing in Centralized Systems – Parsing & Translation, Optimization, Code generation, Example Query Processing in Distributed Systems – Mapping global query to local, Optimization,
DDBMS, characteristics, Centralized vs. Distributed Database, Homogeneous DDBMS, Heterogeneous DDBMS, Advantages, Disadvantages, What is parallel database, Data fragmentation, Replication, Distribution Transaction
This presentation several topics of subjects RDBMS and DBMS including Distributed Database Design,Architecture of Distributed database processing system,Data Communication concept,Concurrency control and recovery. All the topics are briefly described according to syllabus of BCA II and BCA III year subjects.
Query Processing : Query Processing Problem, Layers of Query Processing Query Processing in Centralized Systems – Parsing & Translation, Optimization, Code generation, Example Query Processing in Distributed Systems – Mapping global query to local, Optimization,
DDBMS, characteristics, Centralized vs. Distributed Database, Homogeneous DDBMS, Heterogeneous DDBMS, Advantages, Disadvantages, What is parallel database, Data fragmentation, Replication, Distribution Transaction
Database systems that were based on the object data model were known originally as object-oriented databases (OODBs).These are mainly used for complex objects
Distribution transparency and Distributed transactionshraddha mane
Distribution transparency and Distributed transaction.deadlock detection .Distributed transaction and their types and threads and processes and their difference.
This is the Complete Information about Data Replication you need, i am focused on these topics:
What is replication?
Who use it?
Types ?
Implementation Methods?
Database systems that were based on the object data model were known originally as object-oriented databases (OODBs).These are mainly used for complex objects
Distribution transparency and Distributed transactionshraddha mane
Distribution transparency and Distributed transaction.deadlock detection .Distributed transaction and their types and threads and processes and their difference.
This is the Complete Information about Data Replication you need, i am focused on these topics:
What is replication?
Who use it?
Types ?
Implementation Methods?
Distributed Database Architecture
Database Links
Distributed Database Administration
Transaction Processing in a Distributed System
Distributed Database Application Development
Character Set Support for Distributed Environments
Replication is useful in improving the availability of data by coping data at multiple sites.
Either a relation or a fragment can be replicated at one or more sites.
Fully redundant databases are those in which every site contains a copy of the entire database.
Depending on the availability and redundancy factor there are three types of replications:
Full replication.
No replication.
Partial replication.
Distributed database consists of multiple databases that are connected with each other and are spread across different physical locations. The data that is stored on various physical locations can thus be managed independently of other physical locations. The communication between databases at different physical locations is thus done by a computer network.
A distributed database is a database that is not limited to one computer system.
It is like a database that consists of two or more files located in different computers or sites either on the same network or on an entirely different network.
Instead of storing all of the data in one database, data is divided and stored at different locations or sites which do not share any physical component.
The data can be easily accessed, managed, modified, updated, controlled, and organized in a database.
In computing, a distributed file system (DFS) or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources.
Distributed Database Introduction
TYPES OF DD:
1. HOMOGENEOUS DISTRIBUTED DATABASE
2. HETEROGENEOUS DISTRIBUTED DATABASE
Distributed DBMS Architectures
Architectural Models
Some of the common architectural models are −
● Client - Server Architecture for DDBMS
● Peer - to - Peer Architecture for DDBMS
● Multi - DBMS Architecture
Design issues of distributed system –
1. Complex nature :
Distributed Databases are a network of many computers present at different locations and they provide an outstanding level of performance,
availability, and of course reliability. Therefore, the nature of Distributed DBMS is comparatively more complex than a centralized DBMS. Complex
software is required for Distributed DBMS. Also, It ensures no data replication, which adds even more complexity in its nature.
2. Overall Cost :
Various costs such as maintenance cost, procurement cost, hardware cost, network/communication costs, labor costs, etc, adds up to the overall
cost and make it costlier than normal DBMS.
3. Security issues:
In a Distributed Database, along with maintaining no data redundancy, the security of data as well as a network is a prime concern. A network can be
easily attacked for data theft and misuse.
4. Integrity Control:
In a vast Distributed database system, maintaining data consistency is important. All changes made to data at one site must be reflected on all the
sites. The communication and processing cost is high in Distributed DBMS in order to enforce the integrity of data.
5. Lacking Standards:
Although it provides effective communication and data sharing, still there are no standard rules and protocols to convert a centralized DBMS to a
large Distributed DBMS. Lack of standards decreases the potential of Distributed DBMS.
6. Lack of Professional Support:
Due to a lack of adequate communication standards, it is not possible to link different equipment produced by different vendors into a smoothly
functioning network. Thu several good resources may not be available to the users of the network.
7. Data design complex:
Fragmentation
Similar to Introduction to distributed database (20)
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
2. Distributed Database
(DDB):
It is comprises of two terms - Distributed Computing
and Database.
Distributed Computing means when multiple
computers are interconnected via some network works
as a single unit to perform the certain task. i.e
Distributed computing systems partitions a big,
unmanageable problem in to smaller pieces and solve
it efficiently in a coordinated manner. They are loosely
coupled. They do not have shared memory concept.
Database is an organised collection of data.
3. Overview
Definition
Difference between parallel and distributed DB
Types of distributed DB
Distributed Data Storage
Replication
Fragmentation
Allocation
Advantages ( Objectives ) of DDB
Disadvantages
4. Definition
Distributed Database(DDB) is defined as a
collection of multiple Logically interrelated database
over a computer network.
Distributed Database Management
System(DDBMS) is defined as a software that
manages a distributed database while making the
distribution transparent to the user. i.e A collection of
files stored at different nodes of a network and the
maintaining of interrelationships among them via
hyperlinks has become a common organisation on the
internet
5. 4
Parallel database Distributed Database
It is a system where multiple
processors or machines are used to
excecute and run queries in parallel.
It is a collection of multiple logically
interrelated database distributed over
a network
The nodes are located at
geographical same location
The nodes are usually located at
geographically different locations.
These are based on shared memory
or shared disk architecture i.e they
either shared a primary or a
secondary (disk)memory or both.
These are based on shared nothing
architecture.i.e every machine has its
own primary or secondary
(disk)memory, no common memory
exists in the mode of operation.
Their excution speed is quiet fast Their execution speed is slow.
They are difficult to expand They are easier to expand.
6. Types of Distributed Database Systems:
Distributed database
systems
Homogeneous Heterogeneous
7. Homogeneous distributed
database system
All sites have identical database management
system software.
All sites are aware of one another, and agree to
cooperate in processing users’ requests.
The operating system, the data structures and the
database application used at each location must be
same or compatible.
This system appears to the user as a single system
and it is much easier to design and manage
8. Heterogeneous distributed
database system
Different sites may use different database
management system software.
The sites may not be aware of one another.
They may provide only limited facilities for
cooperation in transaction processing.
The operating system, the data structures and the
database application used at each location are
incompatible
9. Distributed Data Storage
Consider a relation r that is to be stored in the database. There
are several approaches to storing this relation in the
distributed database.
Replication: The system maintains several identical
replicas (copies) of the relation, and stores each replica at a
different site.
Fragmentation: The system partitions the relation into
several fragments and stores each fragment at a different
site.
Allocation: Each fragment – or each (replicas)copy of
fragment – must be assigned to a particular site in the
distributed system. This is called data allocation
10. Replication
If relation r is replicated, a copy of relation r is stored in
two or more sites. In the most extreme case, we have full
replication, in which a copy is stored in every site in the
system. There are number of advantages and
Disadvantages to this replication :
Availability : If one of the sites containing relation r fails,
then the relation r can be found in another site.
Increased Parallelism : The more replicas of r there are,
the greater the chance that the needed data will be found in
the site where the transaction is executing.
11. Increased Overhead on Update : The system must ensure
that all replicas of a relation r are consistent , otherwise
erroneous computation may result. Thus , whenever r is
updated, the update must be propagated to all sites
containing replicas.
Replication enhances the performance of read operation
and increase the availability of data to read only
transaction.
12. Fragmentation
If relation r is fragmented, r is divided into a
number of fragments r1, r2, . . . , rn. These
fragments contain sufficient information to allow
reconstruction of the original relation r using
Union or Join operation on various fragments.
There are two different schemes for fragmenting a
relation:
Horizontal fragmentation
Vertical fragmentation
13. 8
Horizontal Fragmentation
This splits the relation by assigning each tuple of r to one or more fragments
Branch-name Account-
number
Balance
Hillside A-305 500
Hillside A-226 336
Valleyview A-177 205
Valleyview A-402 10000
Hillside A-155 62
Branch-
name
Account-
no
Balance
Valleyview A-177 205
Valleyview A-402 10000
Branch-
name
Account-no Balance
Hillside A-305 500
Hillside A-226 336
Hillside A-155 62
14. Vertical fragmentation
This splits the relation into subrelation where each sub relation is defined by a subset of the
columns of the original relation.
Branch
no
Accountn
o
Cu_name Balance Tup_id
Hillside A-305 Lowman 500 1
Hillside A-226 Camp 336 2
Valleyview A-177 Camp 205 3
Valleyview A-402 Khan 10000 4
Hillside A-155 Khan 62 5
Valleyview A-408 Khan 1123 6
Branchno Cu_name Tup_id
Hillside Lowman 1
Hillside Camp 2
Valleyview Camp 3
Valleyview Khan 4
Hillside Khan 5
Valleyview Khan 6
Accountn
o
Balance Tup_id
A-305 500 1
A-226 336 2
A-177 205 3
A-402 10000 4
A-155 62 5
15. Allocation
The choice of sites and the degree of replication depend on the
performance and availability goals of the system.
It also depends on the types and frequencies of transactions
submitted at each site
For example
If high availability is required and transaction can be
submitted at any site.
If most transactions are retrieval only, fully replicated
database is a good choice.
If certain transactions that access particular parts of the
database are mostly submitted at a particular site,the
corresponding set of fragments can be allocated at that site only.
16. Advantages ( Objectives of
Distributed Databases )
1. Management of Distributed Data with different levels of
transparency : Distribution transparent in the sense of
hiding the details of where each file(table ,relation) is
physically stored within the system. Types of
transparencies are
Distribution or Network Transparency : Freedom for the
user from the operational details of the network. It is
further divided into Location Transparency (independent
of location of data and the location of the system) and
Naming Transparency ( named object can be accessed
without additional information).
17. Replication Transparency : It makes the user unaware of
the existence of copies.
Fragmentation Transparency : It makes the user unaware
of the existence of Horizontal or Vertical Fragments
18. 2. Increased reliability and
availability
These are potential advantages cited
for distributed data base.
Reliability is broadly defined as the
probability that the system is running at
a certain time point
Availability is the probability that the
system is continuously available during
the time interval.
19. 3. Improved Performance
The distributed DBMS fragments the
data base by keeping the data closer to
where it is needed most.
Data localisation reduced the contention
for CPU and input output services and
simultaneously reduced excess delays
involved in wide area networks.
20. 4. Easier
Expansion
Expansion of the system
in terms of adding more
data increasing data base
size or adding more
processors is much easier
5. Keeping Track
of Data
The ability to keep track
of the data distribution,
fragmentation and
replication by expanding
the DDBMS catalogue
21. 6. Distributed
Query
Processing
The ability to access remote
sites and transmit querries
and data amongst the
various sites via
communication network.
7. Replicated data
management
The ability to decide which
copy of replicated data item
to access and to maintain
the consistency of copies of
a replicated data item.
22. 8. Distributed Transaction
Management
The ability to devise execution strategy for
querries and transactions that access data
from more than one site and to synchronise
the access to distributed data and maintain
the integrity of the overall data base.
23. 9. Distributed data base recovery
The ability to recover from individual
site crashes and from new types of
failures such as failure of the
communication link.
24. 10. Security
Distributed transaction must
be executed with the proper
management of the security
of the data and the access
privilege of users.
11. Distributed
Directory (
Catalogue )
Management
Directory contains
information ( meta data )
about data in the data base.
The directory may be global
for entire DDB
25. Disadvantages
Complexity of management and control
Increased storage and infrastructure
requirement
Multiple copies of data has to be at
different sites thus an additional disc
storage space will be required
26. The probability of security lapse
increases when data are located at
multiple sites.
Difficult to maintain integrity