1. Distribution Transparency
Distribution transparency is the property of distributed
databases by the virtue of which the internal details of the
distribution are hidden from the users. The DDBMS designer
may choose to fragment tables, replicate the fragments and
store them at different sites. However, since users are
oblivious of these details, they find the distributed database
easy to use like any centralized database.
The three dimensions of distribution transparency are
Location transparency
Fragmentation transparency
Replication transparency
Location Transparency
 Location transparency ensures that the user can query on any
table(s) or fragment(s) of a table as if they were stored locally in
the user’s site. The fact that the table or its fragments are stored
at remote site in the distributed database system, should be
completely oblivious to the end user. The address of the remote
site(s) and the access mechanisms are completely hidden.
 In order to incorporate location transparency, DDBMS should
have access to updated and accurate data dictionary and
DDBMS directory which contains the details of locations of
data.
Fragmentation Transparency
 Fragmentation transparency enables users to query
upon any table as if it were un-fragmented. Thus, it
hides the fact that the table the user is querying on is
actually a fragment or union of some fragments. It also
conceals the fact that the fragments are located at
diverse sites.
 This is somewhat similar to users of SQL views, where
the user may not know that they are using a view of a
table instead of the table itself.
Replication Transparency
Replication transparency ensures that replication of databases
are hidden from the users. It enables users to query upon a table
as if only a single copy of the table exists.
Replication transparency is associated with concurrency
transparency and failure transparency. Whenever a user updates a
data item, the update is reflected in all the copies of the table.
However, this operation should not be known to the user. This is
concurrency transparency. Also, in case of failure of a site, the user
can still proceed with his queries using replicated copies without
any knowledge of failure. This is failure transparency.
Combination of Transparencies
 In any distributed database system, the designer should
ensure that all the stated transparencies are maintained
to a considerable extent. The designer may choose to
fragment tables, replicate them and store them at
different sites; all oblivious to the end user. However,
complete distribution transparency is a tough task and
requires considerable design efforts.
2. Explain Distributed Transaction
 A distributed transaction is a database
transaction in which two or more network
hosts are involved.
 Usually, hosts provide transactional
resources, while the transaction manager is
responsible for creating and managing a
global transaction that encompasses all
operations against such resources.
There are 4 properties:
Atomicity
 Atomicity means that you can guarantee that all of a transaction
happens, or none of it does; you can do complex operations as one
single unit, all or nothing, and a crash, power failure, error, or anything
else won't allow you to be in a state in which only some of the related
changes have happened.
Consistency
 Consistency means that you guarantee that your data will be
consistent; none of the constraints you have on related data will ever
be violated.
3.Isolation
 Isolation means that one transaction cannot read data from another
transaction that is not yet completed. If two transactions are executing
concurrently, each one will see the world as if they were executing
sequentially, and if one needs to read data that is written by another,
it will have to wait until the other is finished.
4.Durability
Durability means that once a transaction is complete, it is guaranteed that all of
the changes have been recorded to a durable medium (such as a hard disk), and
the fact that the transaction has been completed is likewise recorded.
3. How deadlock detection is different
for a distributed system
Deadlock detection algorithms get simplified
by maintaining Wait-for-graph (WFG) and
searching for cycles.
The different approaches for deadlock
detection are:
 Centralized Approach for Deadlock
Detection
 In this approach a local coordinator at each
site maintains a WFG for its local resources
and a central coordinator for constructing the
union of all the individual WFGs.
The central coordinator constructs the global
WFG from the information received from the
local coordinators of all sites.
Hierarchical Approach for Deadlock
Detection:
The hierarchical approach overcomes drawbacks
of the centralized approach.
This approach uses a logical hierarchy of deadlock
detectors called as controllers.
Each controller detects only those deadlocks that
have the sites falling within the range of the
hierarchy. Global WFG is distributed over a number
of different controllers in this approach.
Fully Distributed Approaches for Deadlock
Detection
In this approach each site shares equal
responsibility for deadlock detection.
 The first algorithm is based on construction of
WFG and second one is a probe-based
algorithm.
4. Comparison between Process and
Thread:
Process Thread
Definition
An executing instance of a program is called a
process.
A thread is a subset of the process.
Process
It has its own copy of the data segment of the
parent process.
It has direct access to the data segment of its
process.
Communication
Processes must use inter-process
communication to communicate with sibling
processes.
Threads can directly communicate with other
threads of its process.
Overheads Processes have considerable overhead. Threads have almost no overhead.
Creation
New processes require duplication of the
parent process.
New threads are easily created.
Control
Processes can only exercise control over
child processes.
Threads can exercise considerable control
over threads of the same process.
Changes
Any change in the parent process does not
affect child processes.
Any change in the main thread may affect
the behavior of the other threads of the
process.
Memory Run in separate memory spaces. Run in shared memory spaces.
File descriptors
Most file descriptors are not shared. It shares file descriptors.
File system There is no sharing of file system context. It shares file system context.
Signal
It does not share signal
handling.
It shares signal handling.
Controlled by
Process is controlled by the
operating system.
Threads are controlled by
programmer in a program.
Dependence Processes are independent. Threads are dependent.
Types of Thread -
 Threads are implemented in following two ways −
1. User Level Threads − User managed threads.
1. Kernel Level Threads − Operating System managed
threads acting on kernel, an operating system core
1. User Level Threads-
In this case, the thread management kernel is not
aware of the existence of threads. The thread library
contains code for creating and destroying threads, for
passing message and data between threads, for
scheduling thread execution and for saving and
restoring thread contexts. The application starts with
a single thread.
Advantages -
oThread switching does not require Kernel mode privileges.
oUser level thread can run on any operating system.
oScheduling can be application specific in the user level
thread.
oUser level threads are fast to create and manage.
Disadvantages -
oIn a typical operating system, most system calls are blocking.
oMultithreaded application cannot take advantage of
multiprocessing.
2. Kernel Level Threads -
In this case, thread management is done by the
Kernel. There is no thread management code in the
application area. Kernel threads are supported
directly by the operating system. Any application can
be programmed to be multithreaded. All of the
threads within an application are supported within a
single process.
The Kernel maintains context information for the
process as a whole and for individuals threads within
the process. Scheduling by the Kernel is done on a
thread basis. The Kernel performs thread creation,
scheduling and management in Kernel space. Kernel
threads are generally slower to create and manage
than the user threads.
Advantages -
oKernel can simultaneously schedule multiple threads from
the same process on multiple processes.
oIf one thread in a process is blocked, the Kernel can
schedule another thread of the same process.
oKernel routines themselves can be multithreaded.
Disadvantages -
oKernel threads are generally slower to create and manage
than the user threads.
oTransfer of control from one thread to another within the
same process requires a mode switch to the Kernel.

Distribution transparency and Distributed transaction

  • 1.
    1. Distribution Transparency Distributiontransparency is the property of distributed databases by the virtue of which the internal details of the distribution are hidden from the users. The DDBMS designer may choose to fragment tables, replicate the fragments and store them at different sites. However, since users are oblivious of these details, they find the distributed database easy to use like any centralized database. The three dimensions of distribution transparency are Location transparency Fragmentation transparency Replication transparency
  • 2.
    Location Transparency  Locationtransparency ensures that the user can query on any table(s) or fragment(s) of a table as if they were stored locally in the user’s site. The fact that the table or its fragments are stored at remote site in the distributed database system, should be completely oblivious to the end user. The address of the remote site(s) and the access mechanisms are completely hidden.  In order to incorporate location transparency, DDBMS should have access to updated and accurate data dictionary and DDBMS directory which contains the details of locations of data.
  • 3.
    Fragmentation Transparency  Fragmentationtransparency enables users to query upon any table as if it were un-fragmented. Thus, it hides the fact that the table the user is querying on is actually a fragment or union of some fragments. It also conceals the fact that the fragments are located at diverse sites.  This is somewhat similar to users of SQL views, where the user may not know that they are using a view of a table instead of the table itself.
  • 4.
    Replication Transparency Replication transparencyensures that replication of databases are hidden from the users. It enables users to query upon a table as if only a single copy of the table exists. Replication transparency is associated with concurrency transparency and failure transparency. Whenever a user updates a data item, the update is reflected in all the copies of the table. However, this operation should not be known to the user. This is concurrency transparency. Also, in case of failure of a site, the user can still proceed with his queries using replicated copies without any knowledge of failure. This is failure transparency.
  • 5.
    Combination of Transparencies In any distributed database system, the designer should ensure that all the stated transparencies are maintained to a considerable extent. The designer may choose to fragment tables, replicate them and store them at different sites; all oblivious to the end user. However, complete distribution transparency is a tough task and requires considerable design efforts.
  • 6.
    2. Explain DistributedTransaction  A distributed transaction is a database transaction in which two or more network hosts are involved.  Usually, hosts provide transactional resources, while the transaction manager is responsible for creating and managing a global transaction that encompasses all operations against such resources.
  • 7.
    There are 4properties: Atomicity  Atomicity means that you can guarantee that all of a transaction happens, or none of it does; you can do complex operations as one single unit, all or nothing, and a crash, power failure, error, or anything else won't allow you to be in a state in which only some of the related changes have happened. Consistency  Consistency means that you guarantee that your data will be consistent; none of the constraints you have on related data will ever be violated. 3.Isolation  Isolation means that one transaction cannot read data from another transaction that is not yet completed. If two transactions are executing concurrently, each one will see the world as if they were executing sequentially, and if one needs to read data that is written by another, it will have to wait until the other is finished.
  • 8.
    4.Durability Durability means thatonce a transaction is complete, it is guaranteed that all of the changes have been recorded to a durable medium (such as a hard disk), and the fact that the transaction has been completed is likewise recorded.
  • 9.
    3. How deadlockdetection is different for a distributed system Deadlock detection algorithms get simplified by maintaining Wait-for-graph (WFG) and searching for cycles. The different approaches for deadlock detection are:
  • 10.
     Centralized Approachfor Deadlock Detection  In this approach a local coordinator at each site maintains a WFG for its local resources and a central coordinator for constructing the union of all the individual WFGs. The central coordinator constructs the global WFG from the information received from the local coordinators of all sites.
  • 11.
    Hierarchical Approach forDeadlock Detection: The hierarchical approach overcomes drawbacks of the centralized approach. This approach uses a logical hierarchy of deadlock detectors called as controllers. Each controller detects only those deadlocks that have the sites falling within the range of the hierarchy. Global WFG is distributed over a number of different controllers in this approach.
  • 12.
    Fully Distributed Approachesfor Deadlock Detection In this approach each site shares equal responsibility for deadlock detection.  The first algorithm is based on construction of WFG and second one is a probe-based algorithm.
  • 13.
    4. Comparison betweenProcess and Thread: Process Thread Definition An executing instance of a program is called a process. A thread is a subset of the process. Process It has its own copy of the data segment of the parent process. It has direct access to the data segment of its process. Communication Processes must use inter-process communication to communicate with sibling processes. Threads can directly communicate with other threads of its process. Overheads Processes have considerable overhead. Threads have almost no overhead. Creation New processes require duplication of the parent process. New threads are easily created.
  • 14.
    Control Processes can onlyexercise control over child processes. Threads can exercise considerable control over threads of the same process. Changes Any change in the parent process does not affect child processes. Any change in the main thread may affect the behavior of the other threads of the process. Memory Run in separate memory spaces. Run in shared memory spaces. File descriptors Most file descriptors are not shared. It shares file descriptors. File system There is no sharing of file system context. It shares file system context.
  • 15.
    Signal It does notshare signal handling. It shares signal handling. Controlled by Process is controlled by the operating system. Threads are controlled by programmer in a program. Dependence Processes are independent. Threads are dependent.
  • 16.
    Types of Thread-  Threads are implemented in following two ways − 1. User Level Threads − User managed threads. 1. Kernel Level Threads − Operating System managed threads acting on kernel, an operating system core
  • 17.
    1. User LevelThreads- In this case, the thread management kernel is not aware of the existence of threads. The thread library contains code for creating and destroying threads, for passing message and data between threads, for scheduling thread execution and for saving and restoring thread contexts. The application starts with a single thread.
  • 18.
    Advantages - oThread switchingdoes not require Kernel mode privileges. oUser level thread can run on any operating system. oScheduling can be application specific in the user level thread. oUser level threads are fast to create and manage. Disadvantages - oIn a typical operating system, most system calls are blocking. oMultithreaded application cannot take advantage of multiprocessing.
  • 19.
    2. Kernel LevelThreads - In this case, thread management is done by the Kernel. There is no thread management code in the application area. Kernel threads are supported directly by the operating system. Any application can be programmed to be multithreaded. All of the threads within an application are supported within a single process. The Kernel maintains context information for the process as a whole and for individuals threads within the process. Scheduling by the Kernel is done on a thread basis. The Kernel performs thread creation, scheduling and management in Kernel space. Kernel threads are generally slower to create and manage than the user threads.
  • 20.
    Advantages - oKernel cansimultaneously schedule multiple threads from the same process on multiple processes. oIf one thread in a process is blocked, the Kernel can schedule another thread of the same process. oKernel routines themselves can be multithreaded. Disadvantages - oKernel threads are generally slower to create and manage than the user threads. oTransfer of control from one thread to another within the same process requires a mode switch to the Kernel.