DBMS UNIT V.pptx

UNIT V
ADVANCED TOPICS
Distributed Databases: Architecture, Data Storage,
Transaction Processing – Object-based Databases:
Object Database Concepts, Object-Relational features,
ODMG Object Model, ODL, OQL - XML Databases:
XML Hierarchical Model, DTD, XML Schema,
XQuery – Information Retrieval: IR Concepts,
Retrieval Models, Queries in IR systems.

Distributed Databases
 A distributed database is a set of
interconnected databases that is distributed
over the computer network or internet.
 It manages the distributed database and
provides mechanisms so as to make the
databases transparent to the users

 Features
 Databases in the collection are logically interrelated with
each other. Often they represent a single logical database.
 Data is physically stored across multiple sites.
 The processors in the sites are connected via a network.
 A distributed database is not a loosely connected file
system.

 Advantages:
 Fast data processing
 Reliability and availability
 Reduced operating cost
 Easier to expand
 Improved sharing ability and local autonomy.

 Disadvantages:
 Complex to manage and control.
 The security issues must be carefully managed
 The system require deadlock handling during the
transaction processing
 Need of standardization.

 Homogeneous Distributed Database:
 In this, all sites have identical database
management system software.
In such a system, local sites surrender a portion of
their autonomy in terms of their right to change
schemas or database management system software.

 Homogeneous Distributed Database:
 This software must also cooperate with other sites
in exchanging information about transactions, to
make transaction processing possible across
multiple sites.
It appears to user as a single system.

 Heterogeneous Distributed Database:
 In this, different sites may use different schemas, and
different database management system software.
 The sites may not be aware of one another, and they
may provide only limited facilities for cooperation in
transaction processing.

 Data Storage:
 Replication: System maintains multiple copies of
data, stored in different sites, for faster retrieval
and fault tolerance
 Fragmentation: Relation is partitioned into several
fragments stored in distinct sites

 Data Replication:
 The process of storing separate copies of the database
at two or more sites.
 Full Replication: Entire relation is stored at all the
sites.
 Partial Replication: Only some fragments of relation
are replicated on the sites.

 Data Replication – Advantages:
 Availability
 Parallelism
 Faster Accessing
 Fault Tolerance
 Reduction in Network Load

 Data Replication – Disadvantages:
 Increased Storage Requirements
 Increased Cost and Complexity of Data Updating

 Data Fragmentation:
 A division of relation r into fragments r1, r2,
r3…rn which contain sufficient information to
reconstruct relation r.

 Data Fragmentation – Vertical Fragmentation:
 The fields or columns of a table are grouped into
fragments.
 In order to maintain reconstructiveness, each
fragment should contain the primary key field(s) of
the table.

 Data Fragmentation – Vertical Fragmentation:
 Example: Student(RollNo, Marks, City)
 select RollNo from Student
 select City from Student.

 Data Fragmentation – Horizontal Fragmentation:
 In this approach, each tuple of r is assigned to one or
more fragments.
 If relation R is fragmentation in r1 and r2 fragments,
then to bring these fragments back to R we must use
union operation.

 Data Fragmentation – Horizontal
Fragmentation:
 Example:
Select * from student where marks>50 and
city=‘chennai’

 Transaction Processing:
 Transaction may access data at several sites
 Local and Global Transaction

 Transaction Processing – Transaction
Manager:
Maintaining a log for recovery purposes
Participating in coordinating the concurrent
execution of the transactions executing at that site

 Transaction Processing – Transaction
Coordinator:
 Starting the execution of transactions that
originate at the site.
Distributing subtransactions at appropriate sites for
execution

 Transaction Processing – Architecture:

Distributed Databases - Transaction
Processing
 Two Phase Commit Protocol :
 The atomicity is an important property of any
transaction processing.
 Either the transaction will execute completely or it
won’t execute at all.

Processing
 Two Phase Commit Protocol:
 A transaction which executes at multiple sites
must either be committed at all the sites, or aborted
at all the sites.
 Not acceptable to have a transaction committed at
one site and aborted at another.

Processing
 Two Phase Commit Protocol – Voting Phase:

Processing
 Two Phase Commit Protocol – Decision Phase:

Processing
 Phase 1: Obtaining Decision or Voting Phase:
 Step 1: Coordinator site Ci asks all participates to
prepare to commit T.
 Ci adds the records <prepare T> to the log and writes the log
to stable storage.
 It then sends prepare T messages to all participating sites.

Processing
Ci
S2
S3
S4
<Prepare, T>
<Prepare, T>
<Prepare, T>
<Prepare, T>
Coordinating
Site
Log

Processing
 Step 2: Upon receiving message, transaction
manager at participating site determines if it can
commit the transaction.

Processing
Ci
S2
S3
S4
<Ready, T>
<abort, T>
<Ready, T>
Coordinating
Site
<Ready,T>
<No,T>
<Ready,T>

Processing
 If not, add a record <no, T> to the log and send abort
message to Ci.
 If the T can be committed, then:
 add the record <ready T> to the log
 force all records for T to stable storage
 Send ready T message to Ci.

Processing
 Phase 2: Recording Decision Phase:
 Ci adds the decision record <commit T> or <abort
T>, to the log and forces record onto stable
storage.

Processing
 Phase 2: Recoding Decision Phase:
Ci
S2
S3
S4
<Ready, T>
<Ready, T>
<Ready, T>
Coordinating
Site
<Ready,T>
<Ready,T>
<Ready,T>
<Commit, T>

Processing
 Phase 2: Recording Decision Phase:
Ci sends a message to each participant informing it
of the decision.
 Participants take appropriate action locally.

Processing
 Phase 2: Recoding Decision Phase:
Ci
S2
S3
S4
Coordinating
Site
<Commit,T>
<Commit, T>
<Commit,T>
<Commit,T>
<Commit,T>
<Commit,T>
<Commit,T>

Processing
 Failure of Site – Failure of Participating Sites:
 If any of the participating sites gets failed then
when participating site si recovers, it examines the
log entry made by it to take decisions about
executing transaction.

Processing
 Log contain <commit T> record: site executes redo
(T)
Log contains <abort T> record: site executes undo (T)
Log contains <ready T> record: site must consult Ci to
determine the fate of T.
If T committed, redo (T)
If T aborted, undo (T)

Processing
 The log contains no control records concerning T
replies that Sk failed before responding to the prepare
T message from Ci
since the failure of Sk precludes the sending of such a
response C1 must abort T
Sk must execute undo (T)

Processing
 Failure of Site – Failure of Coordinator Sites:
 If an active site contains a <commit T> record in
its log, then T must be committed.
If an active site contains an <abort T> record in its
log, then T must be aborted.

Processing
 Failure of Site – Failure of Coordinator Sites:
 If some active participating site does not contain a <ready T>
record in its log, then the failed coordinator Ci cannot have
decided to commit T. Can therefore abort T.
 If none of the above cases holds, then all active sites must have a
<ready T> record in their logs, but no additional control records
(such as <abort T> of <commit T>). In this case active sites must
wait for Ci to recover, to find decision.

Processing
 Three Phase Commit Protocol:
 No network partitioning
 At any point at least on site must be up
 At most k sites can fail.

Processing

Processing
 Three Phase Commit Protocol – Phase I:
 Coordinator asks all participants to prepare to
commit transaction Ti. The coordinator then makes
the decision about commit or abort based on the
response from all the participating sites.

Processing
 Three Phase Commit Protocol – Phase II:
 Coordinator makes a decision as in 2Phase
Commit which is called the pre-commit decision
<Pre-commit, T>, and records it in multiple
participating sites.

Processing
 Three Phase Commit Protocol – Phase III:
 Coordinator sends commit/ abort message to all
participating sites.

Processing
 If the coordinating site in case gets failed then one of
the participating site becomes the coordinating site and
consults other participating sites to know the Pre-
commit message which they posses.
 Thus using this pre-commit message the decision
about commit/ abort is taken by this new coordinating
site.

Object based Database
The object based database provide the solution
to model the real world object and their
behavior.
 It is an alternative to relational database
model.

Complex Data Types:
 Address can be viewed as a single string or separate
attributes for each part or composite attributes.
 Applications:
 Computer Aided Design
Hypertext database
Multimedia and image databases.

 Object Classes:
class employee {
/* Variables */
string name; string address; date start-date; int salary;
/* Messages */
int annual-salary(); string get-name(); string get-address();
int set-address(string new-address);
int employment-length();
};

 Inheritance:
 An object-oriented database schema typically requires a
large number of classes.
 For example, bank employees are similar to customers.
 Need to place classes in a specialization hierarchy

 Inheritance:

 Inheritance – Pseudo Code:

 Inheritance:
 The keyword isa is used to indicate that a class is a
specialization of another class.
The specialization of a class are called subclasses.
 E.g., employee is a subclass of person; teller is a subclass
of employee. Conversely, employee is a superclass of teller.

 Inheritance:
 Code Reusability
 Substitutability: Any method of a class, A, can be equally
well be invoked with an object belonging to any
subclass B of A.

 Multiple Inheritance:
 In most cases, tree-structured organization of classes is
adequate to describe applications.
 Multiple inheritance: the ability of class to inherit variables
and methods from multiple superclasses.
 The class/subclass relationship is represented by a rooted
directed acyclic graph (DAG) in which a class may have
more than one superclass.

 Handling name conflicts: When multiple inheritance is
used, there is potential ambiguity if the same variable or
method can be inherited from more than one superclass.

 ODMG Object Model:
 ODMG – Object Database Management Group
 Come up with the specification for using object oriented
database.
 ODL – Object Definition Language
 OQL – Object Query Language
 OML – Object Manipulation Language

 ODL:
 Declaring Classes:
 keyword interface
 The name of the class
 The list of attributes of the class declared using keyword
attribute.

 ODL:
 Declaring Classes – Example:
 interface Student{
attribute integer RollNo;
attribute string Name;
attribute string address;
attribute string course_id;
};

 ODL:
 Declaring Relationships:
 The SQL makes use of foreign key concept to establish
relationships two tables.
 Keyword relationship to declare the relationship among
two relational schema.

 ODL:
 Declaring Relationships:
 interface Student{
relationship Course Stud_Course_real;
};

 ODL:
 Declaring Key:
 To identify the tuple in the relationship.
 Use keyword key to make particular attribute a key.

 ODL:
 Declaring Keys:
 interface Student(Key RollNo){
};

 OQL:
 A query language standard for object oriented databases modeled
after SQL.
 Rules:
 All complete statements must be terminated by a semi-colon
 A list of entries in OQL is usually separated by commas but not
terminated by a comma(,).
 Strings of text are enclosed by matching quotation marks.

 OQL:
 Basic from of OQL: Select, From and Where
 Syntax: SELECT <list of values>
FROM <list of collections and variable assignments>
WHERE < condition>
SELECT Sname:p.name FROM p in People WHERE
p.age>30

 OQL:
Dot notations and Path expressions:
 ta.salary -> real
 t.students -> set of tuples of type tuple(name, fee:real)
representing students
 t.salary -> real

XML Databases
 XML - Extensible Markup Language
 XML tags identify the data and are used to store and
organize the data.
 Characteristics:
 XML is extensible
 XML carries the data, does not present it
 XML is a public standard

XML Databases
 Syntax Rules for XML Declaration
 The XML declaration is case sensitive and must begin with
"<?xml>" where "xml" is written in lower-case.
 If document contains XML declaration, then it strictly
needs to be the first statement of the XML document.

XML Databases
 Element:
 XML elements can be defined as building blocks of an
XML.
 Elements can behave as containers to hold text, elements,
attributes, media objects or all of these.

XML Databases
 Element:
<element-name attribute1 attribute2> .
...content
</element-name>

XML Databases
 Empty Element:
 An empty element (element with no content) has following
syntax: <name attribute1 attribute2.../>

XML Databases
Element – Example:
<?xml version = "1.0"?>
<contact-info>
<address category = "residence">
<name>XYZ</name>
<company>ABC Companu</company>
<phone>1234567890</phone>
</address>
</contact-info>

XML Databases
 Attributes:
 Attribute gives more information about XML elements.
 Attributes define properties of elements.
An XML attribute is always a name-value pair.
<element-name attribute1 attribute2 >
....content..
< /element-name>

XML Databases
 Attributes – Example:
<garden>
<plants category = "flowers" />
<plants category = "shrubs">
</plants>
</garden>

XML Databases
 Types of XML Documents:
 Data Centric XML documents: Many small data items that
follow specific structure. These documents follow predefined
schema that defines tag names.
 Document Centric XML documents: Large amounts of text, such
as articles of book. There are very few or no structured data
elements in these documents.
 Hybrid Documents: Unstructured data and may not have
predefined schema.

XML Databases
 DTD
 DTD – Document Type Definition
 To define the basic building block of any xml document
 Using DTD, specify various elements type, attributes and
their relationships with one another.
 To specify the set of rules for structuring data in any XML
file

XML Databases
 DTD – Elements:
 The basic entity
 The elements are used for defining the tags.
 The elements typically consist of opening and closing tag.
 Ex: <body>some text</body>

XML Databases
 DTD – Attributes:
 Attributes always come in name/value pairs.
 To specify the values of the element.
 These are specified within the double quotes.
 Ex: <img src="computer.gif" />

XML Databases
 DTD – Entities:
 Entities are expanded when a document is parsed by an
XML parser.
Entity References Character
< <
> >
& &
" "

XML Databases
 DTD – PCDATA:
 Parsed Character Data.
 PCDATA is text that WILL be parsed by a parser. The text
will be examined by the parser for entities and markup.
 Tags inside the text will be treated as markup and entities
will be expanded.
 &, <, or > - & < and >

XML Databases
 DTD – CDATA:
 Character Data.
 CDATA is text that will NOT be parsed by a parser. Tags
inside the text will NOT be treated as markup and entities
will not be expanded.

XML Databases
 DTD – Example:
<?xml version="1.0"?>
<page>
<title>Hello friend</title>
<content>Here is some content :)</content>
<comment>samples</comment>
</page>

XML Databases
 DTD – Example:

XML Databases
 DTD – Merits:
 To define the structural components of XML document
 Simple and Compact

XML Databases
 DTD – Demerits:
 It cannot be much specific for complex documents
 The language that DTD uses is not an XML document.
 The DTD cannot define the type of data contained with in
the XML document.

XML Databases
 XML Schema:
 Structure of an XML document.
 The elements and attributes that can appear in a document
 The number of (and order of) child elements
 Data types for elements and attributes
 Default and fixed values for elements and attributes
 XML Schema is an XML-based (and more powerful) alternative
to DTD

XML Databases
 XML Schema:
 Example:
 StudentSchema.xsd
 MySchema.xml

XML Databases
 XML Schema – Advantages:
 The schema provide the support for data types
 The XML schema is written in XML itself and has large number
of built in and derived types.
 Disadvantages:
 Complex to design and hard to learn
 Maintaining the schema for large and complex operations
sometimes slows down the processing ox XML document.

XML Databases
 Xquery:
 To query the XML database, to get information out of XML
databases.
 XQuery FLWOR Expressions
 For - selects a sequence of nodes
 Let - binds a sequence to a variable
 Where - filters the nodes
 Order by - sorts the nodes
 Return - what to return (gets evaluated once for every node)

XML Databases
 Xquery – Example:
 courses. Xml
 display the title elements of the courses whose fees are
greater than 5000
for $x in doc("courses.xml")/courses/course
where $x/fees>5000
return $x/title

XML Databases
 Xquery – Advantages:
 Both hierarchical and tabular data can be retrieved.
 To query tree and graphical structure.
 Used to build web pages.
 Used to transform XML documents.

Information Retrieval
 Information Retrieval:
“The process of retrieving documents form a
collection in response to a query submitted by a user”

 Structured Data:
 A form of data in which the information is in most
organized form.
 Ex: Student table

 Unstructured Data:
 Like human language.
 It does not fit nicely into relational databases.
 Ex: Emails, Text Documents, Social media, Videos and
Images.

 Information Retrieval – Concept of Query
 User can make use of free form of search request – Query
 It is also called as keyword search.

 Characteristics of IR Systems:
 Types of Users:
 Expert User: User who is searching for specific
information that is clear in mind.
 Ex: User who wants to get the information about particular
book.
 Layperson: A user with generic information need.

 Types of Data:
 Search systems can be modified to specific types of data.

 Types of Information Need:
 Navigational Search: To find a particular piece of
information that user needs quickly.
Ex: Finding site of “Anna University”

 Informational Search: To find current information about
some topic.
 Example: Information about current News.

 Transactional Search: To reach a site in which further
interaction happen.
 Ex: Online Reservation.

Database System IR System
Use of Structured data Use of unstructured data
Relational Data model is used Free-form query model is used.
Query returns data Search request returns list or pointers to
documents that may contain the desired
information
Results are based on exact matching Results are based on approximate
matching

 Modes of Interactions:
 Retrieval: Extraction of relevant information from a
repository of documents through an IR query.
 Browsing: The activity of a user visiting or
navigation through similar or related documents based
on the user’s assessment of relevance.

 Hyperlinks: To interconnect web pages and are mainly
used for browsing.
 Anchor texts: Text phrases within documents used to
label hyperlinks and are very relevant to browsing.
 Web Search: combines both activities(retrieval and
browsing)

Web Search Engine: Maintains an indexed repository
of web pages. The most relevant web pages are
returned to the user if possible in descending order of
their relevance.

 IR Processing:
 Statistical Approach:
 The documents are first analyzed and broken down into chunks
of text.
 Each word is counted for its relevance.
 These words are then compared against the query to test the
significant degree of match.
 Based on this matching, the ranked list of documents containing
these words is presented to the user.

 IR Processing:
 Statistical Approach:
 Knowledge base technique of information retrieval is used.
 The syntactical, lexical, sentential, discourse based and
pragmatic level of words used to prepare knowledge base
for understanding.

 Generic IR Framework:

 Retrieval Models:
 Boolean Model:
 Documents represented as a set of terms
 Form queries using standard Boolean logic set-theoretic operators -
AND, OR and NOT.
 Based on “Exact match” with query.
 Lacks sophisticated ranking algorithms.
 Make it easy to associate meta data information and write queries that
match the contents of the documents

 Vector Space Model:
 An algebraic model for representing text documents.
 It provides a framework in which weighting, ranking of
retrieved documents and relevance feedback are possible.
 similarity functions can be used = Cosine of the angle
between the query and document vector commonly used

 Probabilistic Model:
 A More concrete and definitive approach is taken.
 The IR system has to decide whether the documents belong to
the relevant set or non-relevant set for a query.
 To calculate the probability that the document belongs to the
relevant set and compare that with the probability that the
documents belongs to the non relevant set.

 Semantic Model:
 The process of matching documents to a given query is based on
concept level and semantic matching instead of index term
matching.
 This allows retrieval of relevant documents that share
meaningful associations with other documents in the query result.

 Semantic Model – Level of Analysis:
 Morphological Analysis: Analyzed noun, verbs, adjective.
 Syntactical Analysis: Complete phrases in the document
are parsed and then analyzed.
 Semantic Analysis: To resolve the ambiguities in the
words the synonyms are used

 Types of Queries in IR Systems:
 Keywords:
 Consist of words, phrases, and other characterizations of
documents
 Queries compared to set of index keywords
 Allow use of Boolean and other operators to build a
complex query

 Keywords:
 Keywords implicitly connected by a logical AND operator
 Remove stopwords - Most commonly occurring words: a,
the, of
 IR systems do not pay attention to the ordering of these
words in the query

 Boolean Queries:
 AND: both terms must be found
OR: either term found
NOT: record containing keyword omitted
( ): used for nesting
+: equivalent to and
–Boolean operators: equivalent to AND NOT
Document retrieved if query logically true as exact match in do

 Phrase queries:
 Phrase generally enclosed within double quotes
 More restricted and specific version of proximity searching

 Proximity queries:
 Accounts for how close within a record multiple terms
should be to each other
 Common option requires terms to be in the exact order
 Various operator names: NEAR, ADJ(adjacent), or AFTER

 Wildcard queries:
 Support regular expressions and pattern matching-based
searching – ‘Data*’ would retrieve data, database, datapoint,
dataset
 Involves preprocessing overhead
 Retrieval models do not directly provide support for this query
type

 Natural Language queries:
 Few natural language search engines
Active area of research
Easier to answer questions

DBMS UNIT V.pptx

Recommended

Recommended

More Related Content

Similar to DBMS UNIT V.pptx

Similar to DBMS UNIT V.pptx (20)

Recently uploaded

Recently uploaded (20)

DBMS UNIT V.pptx