SlideShare a Scribd company logo
1 of 120
Database Management Systems
UNIT V
ADVANCED TOPICS
Distributed Databases: Architecture, Data Storage,
Transaction Processing – Object-based Databases:
Object Database Concepts, Object-Relational features,
ODMG Object Model, ODL, OQL - XML Databases:
XML Hierarchical Model, DTD, XML Schema,
XQuery – Information Retrieval: IR Concepts,
Retrieval Models, Queries in IR systems.
Distributed Databases
 A distributed database is a set of
interconnected databases that is distributed
over the computer network or internet.
 It manages the distributed database and
provides mechanisms so as to make the
databases transparent to the users
Distributed Databases
 Features
 Databases in the collection are logically interrelated with
each other. Often they represent a single logical database.
 Data is physically stored across multiple sites.
 The processors in the sites are connected via a network.
 A distributed database is not a loosely connected file
system.
Distributed Databases
 Advantages:
 Fast data processing
 Reliability and availability
 Reduced operating cost
 Easier to expand
 Improved sharing ability and local autonomy.
Distributed Databases
 Disadvantages:
 Complex to manage and control.
 The security issues must be carefully managed
 The system require deadlock handling during the
transaction processing
 Need of standardization.
Distributed Databases
 Homogeneous Distributed Database:
 In this, all sites have identical database
management system software.
In such a system, local sites surrender a portion of
their autonomy in terms of their right to change
schemas or database management system software.
Distributed Databases
 Homogeneous Distributed Database:
 This software must also cooperate with other sites
in exchanging information about transactions, to
make transaction processing possible across
multiple sites.
It appears to user as a single system.
Distributed Databases
 Heterogeneous Distributed Database:
 In this, different sites may use different schemas, and
different database management system software.
 The sites may not be aware of one another, and they
may provide only limited facilities for cooperation in
transaction processing.
Distributed Databases
 Data Storage:
 Replication: System maintains multiple copies of
data, stored in different sites, for faster retrieval
and fault tolerance
 Fragmentation: Relation is partitioned into several
fragments stored in distinct sites
Distributed Databases
 Data Replication:
 The process of storing separate copies of the database
at two or more sites.
 Full Replication: Entire relation is stored at all the
sites.
 Partial Replication: Only some fragments of relation
are replicated on the sites.
Distributed Databases
 Data Replication – Advantages:
 Availability
 Parallelism
 Faster Accessing
 Fault Tolerance
 Reduction in Network Load
Distributed Databases
 Data Replication – Disadvantages:
 Increased Storage Requirements
 Increased Cost and Complexity of Data Updating
Distributed Databases
 Data Fragmentation:
 A division of relation r into fragments r1, r2,
r3…rn which contain sufficient information to
reconstruct relation r.
Distributed Databases
 Data Fragmentation – Vertical Fragmentation:
 The fields or columns of a table are grouped into
fragments.
 In order to maintain reconstructiveness, each
fragment should contain the primary key field(s) of
the table.
Distributed Databases
 Data Fragmentation – Vertical Fragmentation:
 Example: Student(RollNo, Marks, City)
 select RollNo from Student
 select City from Student.
Distributed Databases
 Data Fragmentation – Horizontal Fragmentation:
 In this approach, each tuple of r is assigned to one or
more fragments.
 If relation R is fragmentation in r1 and r2 fragments,
then to bring these fragments back to R we must use
union operation.
Distributed Databases
 Data Fragmentation – Horizontal
Fragmentation:
 Example:
Select * from student where marks>50 and
city=‘chennai’
Distributed Databases
 Transaction Processing:
 Transaction may access data at several sites
 Local and Global Transaction
Distributed Databases
 Transaction Processing – Transaction
Manager:
Maintaining a log for recovery purposes
Participating in coordinating the concurrent
execution of the transactions executing at that site
Distributed Databases
 Transaction Processing – Transaction
Coordinator:
 Starting the execution of transactions that
originate at the site.
Distributing subtransactions at appropriate sites for
execution
Distributed Databases
 Transaction Processing – Architecture:
Distributed Databases - Transaction
Processing
 Two Phase Commit Protocol :
 The atomicity is an important property of any
transaction processing.
 Either the transaction will execute completely or it
won’t execute at all.
Distributed Databases - Transaction
Processing
 Two Phase Commit Protocol:
 A transaction which executes at multiple sites
must either be committed at all the sites, or aborted
at all the sites.
 Not acceptable to have a transaction committed at
one site and aborted at another.
Distributed Databases - Transaction
Processing
 Two Phase Commit Protocol – Voting Phase:
Distributed Databases - Transaction
Processing
 Two Phase Commit Protocol – Decision Phase:
Distributed Databases - Transaction
Processing
 Two Phase Commit Protocol:
 Phase 1: Obtaining Decision or Voting Phase:
 Step 1: Coordinator site Ci asks all participates to
prepare to commit T.
 Ci adds the records <prepare T> to the log and writes the log
to stable storage.
 It then sends prepare T messages to all participating sites.
Distributed Databases - Transaction
Processing
 Two Phase Commit Protocol:
 Phase 1: Obtaining Decision or Voting Phase:
Ci
S2
S3
S4
<Prepare, T>
<Prepare, T>
<Prepare, T>
<Prepare, T>
Coordinating
Site
Log
Distributed Databases - Transaction
Processing
 Two Phase Commit Protocol:
 Phase 1: Obtaining Decision or Voting Phase:
 Step 2: Upon receiving message, transaction
manager at participating site determines if it can
commit the transaction.
Distributed Databases - Transaction
Processing
 Two Phase Commit Protocol:
 Phase 1: Obtaining Decision or Voting Phase:
Ci
S2
S3
S4
<Ready, T>
<abort, T>
<Ready, T>
Coordinating
Site
<Ready,T>
<No,T>
<Ready,T>
Distributed Databases - Transaction
Processing
 Two Phase Commit Protocol:
 Phase 1: Obtaining Decision or Voting Phase:
 If not, add a record <no, T> to the log and send abort
message to Ci.
 If the T can be committed, then:
 add the record <ready T> to the log
 force all records for T to stable storage
 Send ready T message to Ci.
Distributed Databases - Transaction
Processing
 Two Phase Commit Protocol:
 Phase 2: Recording Decision Phase:
 Ci adds the decision record <commit T> or <abort
T>, to the log and forces record onto stable
storage.
Distributed Databases - Transaction
Processing
 Two Phase Commit Protocol:
 Phase 2: Recoding Decision Phase:
Ci
S2
S3
S4
<Ready, T>
<Ready, T>
<Ready, T>
Coordinating
Site
<Ready,T>
<Ready,T>
<Ready,T>
<Commit, T>
Distributed Databases - Transaction
Processing
 Two Phase Commit Protocol:
 Phase 2: Recording Decision Phase:
Ci sends a message to each participant informing it
of the decision.
 Participants take appropriate action locally.
Distributed Databases - Transaction
Processing
 Two Phase Commit Protocol:
 Phase 2: Recoding Decision Phase:
Ci
S2
S3
S4
Coordinating
Site
<Commit,T>
<Commit, T>
<Commit,T>
<Commit,T>
<Commit,T>
<Commit,T>
<Commit,T>
Distributed Databases - Transaction
Processing
 Failure of Site – Failure of Participating Sites:
 If any of the participating sites gets failed then
when participating site si recovers, it examines the
log entry made by it to take decisions about
executing transaction.
Distributed Databases - Transaction
Processing
 Failure of Site – Failure of Participating Sites:
 Log contain <commit T> record: site executes redo
(T)
Log contains <abort T> record: site executes undo (T)
Log contains <ready T> record: site must consult Ci to
determine the fate of T.
If T committed, redo (T)
If T aborted, undo (T)
Distributed Databases - Transaction
Processing
 Failure of Site – Failure of Participating Sites:
 The log contains no control records concerning T
replies that Sk failed before responding to the prepare
T message from Ci
since the failure of Sk precludes the sending of such a
response C1 must abort T
Sk must execute undo (T)
Distributed Databases - Transaction
Processing
 Failure of Site – Failure of Coordinator Sites:
 If an active site contains a <commit T> record in
its log, then T must be committed.
If an active site contains an <abort T> record in its
log, then T must be aborted.
Distributed Databases - Transaction
Processing
 Failure of Site – Failure of Coordinator Sites:
 If some active participating site does not contain a <ready T>
record in its log, then the failed coordinator Ci cannot have
decided to commit T. Can therefore abort T.
 If none of the above cases holds, then all active sites must have a
<ready T> record in their logs, but no additional control records
(such as <abort T> of <commit T>). In this case active sites must
wait for Ci to recover, to find decision.
Distributed Databases - Transaction
Processing
 Three Phase Commit Protocol:
 No network partitioning
 At any point at least on site must be up
 At most k sites can fail.
Distributed Databases - Transaction
Processing
 Three Phase Commit Protocol:
Distributed Databases - Transaction
Processing
 Three Phase Commit Protocol – Phase I:
 Coordinator asks all participants to prepare to
commit transaction Ti. The coordinator then makes
the decision about commit or abort based on the
response from all the participating sites.
Distributed Databases - Transaction
Processing
 Three Phase Commit Protocol – Phase II:
 Coordinator makes a decision as in 2Phase
Commit which is called the pre-commit decision
<Pre-commit, T>, and records it in multiple
participating sites.
Distributed Databases - Transaction
Processing
 Three Phase Commit Protocol – Phase III:
 Coordinator sends commit/ abort message to all
participating sites.
Distributed Databases - Transaction
Processing
 Three Phase Commit Protocol:
 If the coordinating site in case gets failed then one of
the participating site becomes the coordinating site and
consults other participating sites to know the Pre-
commit message which they posses.
 Thus using this pre-commit message the decision
about commit/ abort is taken by this new coordinating
site.
Object based Database
The object based database provide the solution
to model the real world object and their
behavior.
 It is an alternative to relational database
model.
Object based Database
Complex Data Types:
 Address can be viewed as a single string or separate
attributes for each part or composite attributes.
 Applications:
 Computer Aided Design
Hypertext database
Multimedia and image databases.
Object based Database
 Object Classes:
class employee {
/* Variables */
string name; string address; date start-date; int salary;
/* Messages */
int annual-salary(); string get-name(); string get-address();
int set-address(string new-address);
int employment-length();
};
Object based Database
 Inheritance:
 An object-oriented database schema typically requires a
large number of classes.
 For example, bank employees are similar to customers.
 Need to place classes in a specialization hierarchy
Object based Database
 Inheritance:
Object based Database
 Inheritance – Pseudo Code:
Object based Database
 Inheritance:
 The keyword isa is used to indicate that a class is a
specialization of another class.
The specialization of a class are called subclasses.
 E.g., employee is a subclass of person; teller is a subclass
of employee. Conversely, employee is a superclass of teller.
Object based Database
 Inheritance:
 Code Reusability
 Substitutability: Any method of a class, A, can be equally
well be invoked with an object belonging to any
subclass B of A.
Object based Database
 Multiple Inheritance:
 In most cases, tree-structured organization of classes is
adequate to describe applications.
 Multiple inheritance: the ability of class to inherit variables
and methods from multiple superclasses.
 The class/subclass relationship is represented by a rooted
directed acyclic graph (DAG) in which a class may have
more than one superclass.
Object based Database
 Multiple Inheritance:
Object based Database
 Multiple Inheritance:
 Handling name conflicts: When multiple inheritance is
used, there is potential ambiguity if the same variable or
method can be inherited from more than one superclass.
Object based Database
 ODMG Object Model:
 ODMG – Object Database Management Group
 Come up with the specification for using object oriented
database.
 ODL – Object Definition Language
 OQL – Object Query Language
 OML – Object Manipulation Language
Object based Database
 ODL:
 Declaring Classes:
 keyword interface
 The name of the class
 The list of attributes of the class declared using keyword
attribute.
Object based Database
 ODL:
 Declaring Classes – Example:
 interface Student{
attribute integer RollNo;
attribute string Name;
attribute string address;
attribute string course_id;
};
Object based Database
 ODL:
 Declaring Relationships:
 The SQL makes use of foreign key concept to establish
relationships two tables.
 Keyword relationship to declare the relationship among
two relational schema.
Object based Database
 ODL:
 Declaring Relationships:
 interface Student{
attribute integer RollNo;
attribute string Name;
attribute string address;
attribute string course_id;
relationship Course Stud_Course_real;
};
Object based Database
 ODL:
 Declaring Key:
 To identify the tuple in the relationship.
 Use keyword key to make particular attribute a key.
Object based Database
 ODL:
 Declaring Keys:
 interface Student(Key RollNo){
attribute integer RollNo;
attribute string Name;
attribute string address;
attribute string course_id;
};
Object based Database
 OQL:
 A query language standard for object oriented databases modeled
after SQL.
 Rules:
 All complete statements must be terminated by a semi-colon
 A list of entries in OQL is usually separated by commas but not
terminated by a comma(,).
 Strings of text are enclosed by matching quotation marks.
Object based Database
 OQL:
 Basic from of OQL: Select, From and Where
 Syntax: SELECT <list of values>
FROM <list of collections and variable assignments>
WHERE < condition>
SELECT Sname:p.name FROM p in People WHERE
p.age>30
Object based Database
 OQL:
Dot notations and Path expressions:
 ta.salary -> real
 t.students -> set of tuples of type tuple(name, fee:real)
representing students
 t.salary -> real
XML Databases
 XML - Extensible Markup Language
 XML tags identify the data and are used to store and
organize the data.
 Characteristics:
 XML is extensible
 XML carries the data, does not present it
 XML is a public standard
XML Databases
 Syntax Rules for XML Declaration
 The XML declaration is case sensitive and must begin with
"<?xml>" where "xml" is written in lower-case.
 If document contains XML declaration, then it strictly
needs to be the first statement of the XML document.
XML Databases
 Element:
 XML elements can be defined as building blocks of an
XML.
 Elements can behave as containers to hold text, elements,
attributes, media objects or all of these.
XML Databases
 Element:
<element-name attribute1 attribute2> .
...content
</element-name>
XML Databases
 Empty Element:
 An empty element (element with no content) has following
syntax: <name attribute1 attribute2.../>
XML Databases
Element – Example:
<?xml version = "1.0"?>
<contact-info>
<address category = "residence">
<name>XYZ</name>
<company>ABC Companu</company>
<phone>1234567890</phone>
</address>
</contact-info>
XML Databases
 Attributes:
 Attribute gives more information about XML elements.
 Attributes define properties of elements.
An XML attribute is always a name-value pair.
<element-name attribute1 attribute2 >
....content..
< /element-name>
XML Databases
 Attributes – Example:
<garden>
<plants category = "flowers" />
<plants category = "shrubs">
</plants>
</garden>
XML Databases
 Types of XML Documents:
 Data Centric XML documents: Many small data items that
follow specific structure. These documents follow predefined
schema that defines tag names.
 Document Centric XML documents: Large amounts of text, such
as articles of book. There are very few or no structured data
elements in these documents.
 Hybrid Documents: Unstructured data and may not have
predefined schema.
XML Databases
 DTD
 DTD – Document Type Definition
 To define the basic building block of any xml document
 Using DTD, specify various elements type, attributes and
their relationships with one another.
 To specify the set of rules for structuring data in any XML
file
XML Databases
 DTD – Elements:
 The basic entity
 The elements are used for defining the tags.
 The elements typically consist of opening and closing tag.
 Ex: <body>some text</body>
XML Databases
 DTD – Attributes:
 Attributes always come in name/value pairs.
 To specify the values of the element.
 These are specified within the double quotes.
 Ex: <img src="computer.gif" />
XML Databases
 DTD – Entities:
 Entities are expanded when a document is parsed by an
XML parser.
Entity References Character
&lt; <
&gt; >
&amp; &
&quot; "
XML Databases
 DTD – PCDATA:
 Parsed Character Data.
 PCDATA is text that WILL be parsed by a parser. The text
will be examined by the parser for entities and markup.
 Tags inside the text will be treated as markup and entities
will be expanded.
 &, <, or > - &amp; &lt; and &gt;
XML Databases
 DTD – CDATA:
 Character Data.
 CDATA is text that will NOT be parsed by a parser. Tags
inside the text will NOT be treated as markup and entities
will not be expanded.
XML Databases
 DTD – Example:
<?xml version="1.0"?>
<page>
<title>Hello friend</title>
<content>Here is some content :)</content>
<comment>samples</comment>
</page>
XML Databases
 DTD – Example:
XML Databases
 DTD – Merits:
 To define the structural components of XML document
 Simple and Compact
XML Databases
 DTD – Demerits:
 It cannot be much specific for complex documents
 The language that DTD uses is not an XML document.
 The DTD cannot define the type of data contained with in
the XML document.
XML Databases
 XML Schema:
 Structure of an XML document.
 The elements and attributes that can appear in a document
 The number of (and order of) child elements
 Data types for elements and attributes
 Default and fixed values for elements and attributes
 XML Schema is an XML-based (and more powerful) alternative
to DTD
XML Databases
 XML Schema:
 Example:
 StudentSchema.xsd
 MySchema.xml
XML Databases
 XML Schema – Advantages:
 The schema provide the support for data types
 The XML schema is written in XML itself and has large number
of built in and derived types.
 Disadvantages:
 Complex to design and hard to learn
 Maintaining the schema for large and complex operations
sometimes slows down the processing ox XML document.
XML Databases
 Xquery:
 To query the XML database, to get information out of XML
databases.
 XQuery FLWOR Expressions
 For - selects a sequence of nodes
 Let - binds a sequence to a variable
 Where - filters the nodes
 Order by - sorts the nodes
 Return - what to return (gets evaluated once for every node)
XML Databases
 Xquery – Example:
 courses. Xml
 display the title elements of the courses whose fees are
greater than 5000
for $x in doc("courses.xml")/courses/course
where $x/fees>5000
return $x/title
XML Databases
 Xquery – Advantages:
 Both hierarchical and tabular data can be retrieved.
 To query tree and graphical structure.
 Used to build web pages.
 Used to transform XML documents.
Information Retrieval
 Information Retrieval:
“The process of retrieving documents form a
collection in response to a query submitted by a user”
Information Retrieval
 Information Retrieval:
 Structured Data:
 A form of data in which the information is in most
organized form.
 Ex: Student table
Information Retrieval
 Information Retrieval:
 Unstructured Data:
 Like human language.
 It does not fit nicely into relational databases.
 Ex: Emails, Text Documents, Social media, Videos and
Images.
Information Retrieval
 Information Retrieval – Concept of Query
 User can make use of free form of search request – Query
 It is also called as keyword search.
Information Retrieval
 Characteristics of IR Systems:
 Types of Users:
 Expert User: User who is searching for specific
information that is clear in mind.
 Ex: User who wants to get the information about particular
book.
 Layperson: A user with generic information need.
Information Retrieval
 Characteristics of IR Systems:
 Types of Data:
 Search systems can be modified to specific types of data.
Information Retrieval
 Characteristics of IR Systems:
 Types of Information Need:
 Navigational Search: To find a particular piece of
information that user needs quickly.
Ex: Finding site of “Anna University”
Information Retrieval
 Characteristics of IR Systems:
 Types of Information Need:
 Informational Search: To find current information about
some topic.
 Example: Information about current News.
Information Retrieval
 Characteristics of IR Systems:
 Types of Information Need:
 Transactional Search: To reach a site in which further
interaction happen.
 Ex: Online Reservation.
Information Retrieval
Database System IR System
Use of Structured data Use of unstructured data
Relational Data model is used Free-form query model is used.
Query returns data Search request returns list or pointers to
documents that may contain the desired
information
Results are based on exact matching Results are based on approximate
matching
Information Retrieval
 Modes of Interactions:
 Retrieval: Extraction of relevant information from a
repository of documents through an IR query.
 Browsing: The activity of a user visiting or
navigation through similar or related documents based
on the user’s assessment of relevance.
Information Retrieval
 Modes of Interactions:
 Hyperlinks: To interconnect web pages and are mainly
used for browsing.
 Anchor texts: Text phrases within documents used to
label hyperlinks and are very relevant to browsing.
 Web Search: combines both activities(retrieval and
browsing)
Information Retrieval
 Modes of Interactions:
Web Search Engine: Maintains an indexed repository
of web pages. The most relevant web pages are
returned to the user if possible in descending order of
their relevance.
Information Retrieval
 IR Processing:
 Statistical Approach:
 The documents are first analyzed and broken down into chunks
of text.
 Each word is counted for its relevance.
 These words are then compared against the query to test the
significant degree of match.
 Based on this matching, the ranked list of documents containing
these words is presented to the user.
Information Retrieval
 IR Processing:
 Statistical Approach:
 Knowledge base technique of information retrieval is used.
 The syntactical, lexical, sentential, discourse based and
pragmatic level of words used to prepare knowledge base
for understanding.
Information Retrieval
 Generic IR Framework:
Information Retrieval
 Retrieval Models:
 Boolean Model:
 Documents represented as a set of terms
 Form queries using standard Boolean logic set-theoretic operators -
AND, OR and NOT.
 Based on “Exact match” with query.
 Lacks sophisticated ranking algorithms.
 Make it easy to associate meta data information and write queries that
match the contents of the documents
Information Retrieval
 Retrieval Models:
 Vector Space Model:
 An algebraic model for representing text documents.
 It provides a framework in which weighting, ranking of
retrieved documents and relevance feedback are possible.
 similarity functions can be used = Cosine of the angle
between the query and document vector commonly used
Information Retrieval
 Retrieval Models:
 Probabilistic Model:
 A More concrete and definitive approach is taken.
 The IR system has to decide whether the documents belong to
the relevant set or non-relevant set for a query.
 To calculate the probability that the document belongs to the
relevant set and compare that with the probability that the
documents belongs to the non relevant set.
Information Retrieval
 Retrieval Models:
 Semantic Model:
 The process of matching documents to a given query is based on
concept level and semantic matching instead of index term
matching.
 This allows retrieval of relevant documents that share
meaningful associations with other documents in the query result.
Information Retrieval
 Retrieval Models:
 Semantic Model – Level of Analysis:
 Morphological Analysis: Analyzed noun, verbs, adjective.
 Syntactical Analysis: Complete phrases in the document
are parsed and then analyzed.
 Semantic Analysis: To resolve the ambiguities in the
words the synonyms are used
Information Retrieval
 Types of Queries in IR Systems:
 Keywords:
 Consist of words, phrases, and other characterizations of
documents
 Queries compared to set of index keywords
 Allow use of Boolean and other operators to build a
complex query
Information Retrieval
 Types of Queries in IR Systems:
 Keywords:
 Keywords implicitly connected by a logical AND operator
 Remove stopwords - Most commonly occurring words: a,
the, of
 IR systems do not pay attention to the ordering of these
words in the query
Information Retrieval
 Types of Queries in IR Systems:
 Boolean Queries:
 AND: both terms must be found
OR: either term found
NOT: record containing keyword omitted
( ): used for nesting
+: equivalent to and
–Boolean operators: equivalent to AND NOT
Document retrieved if query logically true as exact match in do
Information Retrieval
 Types of Queries in IR Systems:
 Phrase queries:
 Phrase generally enclosed within double quotes
 More restricted and specific version of proximity searching
Information Retrieval
 Types of Queries in IR Systems:
 Proximity queries:
 Accounts for how close within a record multiple terms
should be to each other
 Common option requires terms to be in the exact order
 Various operator names: NEAR, ADJ(adjacent), or AFTER
Information Retrieval
 Types of Queries in IR Systems:
 Wildcard queries:
 Support regular expressions and pattern matching-based
searching – ‘Data*’ would retrieve data, database, datapoint,
dataset
 Involves preprocessing overhead
 Retrieval models do not directly provide support for this query
type
Information Retrieval
 Types of Queries in IR Systems:
 Natural Language queries:
 Few natural language search engines
Active area of research
Easier to answer questions

More Related Content

Similar to DBMS UNIT V.pptx

Top schools in ghaziabad
Top schools in ghaziabadTop schools in ghaziabad
Top schools in ghaziabadEdhole.com
 
DBMS - Distributed Databases
DBMS - Distributed DatabasesDBMS - Distributed Databases
DBMS - Distributed DatabasesMythiliMurugan3
 
3 distributed transactions-cocurrency-query
3 distributed transactions-cocurrency-query3 distributed transactions-cocurrency-query
3 distributed transactions-cocurrency-queryM Rezaur Rahman
 
What is Database Backup? The 3 Important Recovery Techniques from transaction...
What is Database Backup? The 3 Important Recovery Techniques from transaction...What is Database Backup? The 3 Important Recovery Techniques from transaction...
What is Database Backup? The 3 Important Recovery Techniques from transaction...Raj vardhan
 
Ch17 OS
Ch17 OSCh17 OS
Ch17 OSC.U
 
ScaleFast Grid And Flow
ScaleFast Grid And FlowScaleFast Grid And Flow
ScaleFast Grid And FlowDevelops Ltd
 
Introduction to transaction processing concepts and theory
Introduction to transaction processing concepts and theoryIntroduction to transaction processing concepts and theory
Introduction to transaction processing concepts and theoryZainab Almugbel
 
Distributed database
Distributed databaseDistributed database
Distributed databasesanjay joshi
 
Chapter 18 - Distributed Coordination
Chapter 18 - Distributed CoordinationChapter 18 - Distributed Coordination
Chapter 18 - Distributed CoordinationWayne Jones Jnr
 
Transactionsmanagement
TransactionsmanagementTransactionsmanagement
TransactionsmanagementSanjeev Gupta
 
Software architecture case study - why and why not sql server replication
Software architecture   case study - why and why not sql server replicationSoftware architecture   case study - why and why not sql server replication
Software architecture case study - why and why not sql server replicationShahzad
 
database recovery techniques
database recovery techniques database recovery techniques
database recovery techniques Kalhan Liyanage
 
Recovery in Multi database Systems
Recovery in Multi database SystemsRecovery in Multi database Systems
Recovery in Multi database SystemsMoutasm Tamimi
 

Similar to DBMS UNIT V.pptx (20)

Top schools in ghaziabad
Top schools in ghaziabadTop schools in ghaziabad
Top schools in ghaziabad
 
DBMS - Distributed Databases
DBMS - Distributed DatabasesDBMS - Distributed Databases
DBMS - Distributed Databases
 
3 distributed transactions-cocurrency-query
3 distributed transactions-cocurrency-query3 distributed transactions-cocurrency-query
3 distributed transactions-cocurrency-query
 
Chapter 4 u
Chapter 4 uChapter 4 u
Chapter 4 u
 
Csc concepts
Csc conceptsCsc concepts
Csc concepts
 
Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
What is Database Backup? The 3 Important Recovery Techniques from transaction...
What is Database Backup? The 3 Important Recovery Techniques from transaction...What is Database Backup? The 3 Important Recovery Techniques from transaction...
What is Database Backup? The 3 Important Recovery Techniques from transaction...
 
Ch17 OS
Ch17 OSCh17 OS
Ch17 OS
 
OSCh17
OSCh17OSCh17
OSCh17
 
OS_Ch17
OS_Ch17OS_Ch17
OS_Ch17
 
Sql Server
Sql ServerSql Server
Sql Server
 
ScaleFast Grid And Flow
ScaleFast Grid And FlowScaleFast Grid And Flow
ScaleFast Grid And Flow
 
Advance DBMS
Advance DBMSAdvance DBMS
Advance DBMS
 
Introduction to transaction processing concepts and theory
Introduction to transaction processing concepts and theoryIntroduction to transaction processing concepts and theory
Introduction to transaction processing concepts and theory
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
Chapter 18 - Distributed Coordination
Chapter 18 - Distributed CoordinationChapter 18 - Distributed Coordination
Chapter 18 - Distributed Coordination
 
Transactionsmanagement
TransactionsmanagementTransactionsmanagement
Transactionsmanagement
 
Software architecture case study - why and why not sql server replication
Software architecture   case study - why and why not sql server replicationSoftware architecture   case study - why and why not sql server replication
Software architecture case study - why and why not sql server replication
 
database recovery techniques
database recovery techniques database recovery techniques
database recovery techniques
 
Recovery in Multi database Systems
Recovery in Multi database SystemsRecovery in Multi database Systems
Recovery in Multi database Systems
 

Recently uploaded

Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 

Recently uploaded (20)

Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 

DBMS UNIT V.pptx

  • 2. UNIT V ADVANCED TOPICS Distributed Databases: Architecture, Data Storage, Transaction Processing – Object-based Databases: Object Database Concepts, Object-Relational features, ODMG Object Model, ODL, OQL - XML Databases: XML Hierarchical Model, DTD, XML Schema, XQuery – Information Retrieval: IR Concepts, Retrieval Models, Queries in IR systems.
  • 3. Distributed Databases  A distributed database is a set of interconnected databases that is distributed over the computer network or internet.  It manages the distributed database and provides mechanisms so as to make the databases transparent to the users
  • 4. Distributed Databases  Features  Databases in the collection are logically interrelated with each other. Often they represent a single logical database.  Data is physically stored across multiple sites.  The processors in the sites are connected via a network.  A distributed database is not a loosely connected file system.
  • 5. Distributed Databases  Advantages:  Fast data processing  Reliability and availability  Reduced operating cost  Easier to expand  Improved sharing ability and local autonomy.
  • 6. Distributed Databases  Disadvantages:  Complex to manage and control.  The security issues must be carefully managed  The system require deadlock handling during the transaction processing  Need of standardization.
  • 7. Distributed Databases  Homogeneous Distributed Database:  In this, all sites have identical database management system software. In such a system, local sites surrender a portion of their autonomy in terms of their right to change schemas or database management system software.
  • 8. Distributed Databases  Homogeneous Distributed Database:  This software must also cooperate with other sites in exchanging information about transactions, to make transaction processing possible across multiple sites. It appears to user as a single system.
  • 9. Distributed Databases  Heterogeneous Distributed Database:  In this, different sites may use different schemas, and different database management system software.  The sites may not be aware of one another, and they may provide only limited facilities for cooperation in transaction processing.
  • 10. Distributed Databases  Data Storage:  Replication: System maintains multiple copies of data, stored in different sites, for faster retrieval and fault tolerance  Fragmentation: Relation is partitioned into several fragments stored in distinct sites
  • 11. Distributed Databases  Data Replication:  The process of storing separate copies of the database at two or more sites.  Full Replication: Entire relation is stored at all the sites.  Partial Replication: Only some fragments of relation are replicated on the sites.
  • 12. Distributed Databases  Data Replication – Advantages:  Availability  Parallelism  Faster Accessing  Fault Tolerance  Reduction in Network Load
  • 13. Distributed Databases  Data Replication – Disadvantages:  Increased Storage Requirements  Increased Cost and Complexity of Data Updating
  • 14. Distributed Databases  Data Fragmentation:  A division of relation r into fragments r1, r2, r3…rn which contain sufficient information to reconstruct relation r.
  • 15. Distributed Databases  Data Fragmentation – Vertical Fragmentation:  The fields or columns of a table are grouped into fragments.  In order to maintain reconstructiveness, each fragment should contain the primary key field(s) of the table.
  • 16. Distributed Databases  Data Fragmentation – Vertical Fragmentation:  Example: Student(RollNo, Marks, City)  select RollNo from Student  select City from Student.
  • 17. Distributed Databases  Data Fragmentation – Horizontal Fragmentation:  In this approach, each tuple of r is assigned to one or more fragments.  If relation R is fragmentation in r1 and r2 fragments, then to bring these fragments back to R we must use union operation.
  • 18. Distributed Databases  Data Fragmentation – Horizontal Fragmentation:  Example: Select * from student where marks>50 and city=‘chennai’
  • 19. Distributed Databases  Transaction Processing:  Transaction may access data at several sites  Local and Global Transaction
  • 20. Distributed Databases  Transaction Processing – Transaction Manager: Maintaining a log for recovery purposes Participating in coordinating the concurrent execution of the transactions executing at that site
  • 21. Distributed Databases  Transaction Processing – Transaction Coordinator:  Starting the execution of transactions that originate at the site. Distributing subtransactions at appropriate sites for execution
  • 22. Distributed Databases  Transaction Processing – Architecture:
  • 23. Distributed Databases - Transaction Processing  Two Phase Commit Protocol :  The atomicity is an important property of any transaction processing.  Either the transaction will execute completely or it won’t execute at all.
  • 24. Distributed Databases - Transaction Processing  Two Phase Commit Protocol:  A transaction which executes at multiple sites must either be committed at all the sites, or aborted at all the sites.  Not acceptable to have a transaction committed at one site and aborted at another.
  • 25. Distributed Databases - Transaction Processing  Two Phase Commit Protocol – Voting Phase:
  • 26. Distributed Databases - Transaction Processing  Two Phase Commit Protocol – Decision Phase:
  • 27. Distributed Databases - Transaction Processing  Two Phase Commit Protocol:  Phase 1: Obtaining Decision or Voting Phase:  Step 1: Coordinator site Ci asks all participates to prepare to commit T.  Ci adds the records <prepare T> to the log and writes the log to stable storage.  It then sends prepare T messages to all participating sites.
  • 28. Distributed Databases - Transaction Processing  Two Phase Commit Protocol:  Phase 1: Obtaining Decision or Voting Phase: Ci S2 S3 S4 <Prepare, T> <Prepare, T> <Prepare, T> <Prepare, T> Coordinating Site Log
  • 29. Distributed Databases - Transaction Processing  Two Phase Commit Protocol:  Phase 1: Obtaining Decision or Voting Phase:  Step 2: Upon receiving message, transaction manager at participating site determines if it can commit the transaction.
  • 30. Distributed Databases - Transaction Processing  Two Phase Commit Protocol:  Phase 1: Obtaining Decision or Voting Phase: Ci S2 S3 S4 <Ready, T> <abort, T> <Ready, T> Coordinating Site <Ready,T> <No,T> <Ready,T>
  • 31. Distributed Databases - Transaction Processing  Two Phase Commit Protocol:  Phase 1: Obtaining Decision or Voting Phase:  If not, add a record <no, T> to the log and send abort message to Ci.  If the T can be committed, then:  add the record <ready T> to the log  force all records for T to stable storage  Send ready T message to Ci.
  • 32. Distributed Databases - Transaction Processing  Two Phase Commit Protocol:  Phase 2: Recording Decision Phase:  Ci adds the decision record <commit T> or <abort T>, to the log and forces record onto stable storage.
  • 33. Distributed Databases - Transaction Processing  Two Phase Commit Protocol:  Phase 2: Recoding Decision Phase: Ci S2 S3 S4 <Ready, T> <Ready, T> <Ready, T> Coordinating Site <Ready,T> <Ready,T> <Ready,T> <Commit, T>
  • 34. Distributed Databases - Transaction Processing  Two Phase Commit Protocol:  Phase 2: Recording Decision Phase: Ci sends a message to each participant informing it of the decision.  Participants take appropriate action locally.
  • 35. Distributed Databases - Transaction Processing  Two Phase Commit Protocol:  Phase 2: Recoding Decision Phase: Ci S2 S3 S4 Coordinating Site <Commit,T> <Commit, T> <Commit,T> <Commit,T> <Commit,T> <Commit,T> <Commit,T>
  • 36. Distributed Databases - Transaction Processing  Failure of Site – Failure of Participating Sites:  If any of the participating sites gets failed then when participating site si recovers, it examines the log entry made by it to take decisions about executing transaction.
  • 37. Distributed Databases - Transaction Processing  Failure of Site – Failure of Participating Sites:  Log contain <commit T> record: site executes redo (T) Log contains <abort T> record: site executes undo (T) Log contains <ready T> record: site must consult Ci to determine the fate of T. If T committed, redo (T) If T aborted, undo (T)
  • 38. Distributed Databases - Transaction Processing  Failure of Site – Failure of Participating Sites:  The log contains no control records concerning T replies that Sk failed before responding to the prepare T message from Ci since the failure of Sk precludes the sending of such a response C1 must abort T Sk must execute undo (T)
  • 39. Distributed Databases - Transaction Processing  Failure of Site – Failure of Coordinator Sites:  If an active site contains a <commit T> record in its log, then T must be committed. If an active site contains an <abort T> record in its log, then T must be aborted.
  • 40. Distributed Databases - Transaction Processing  Failure of Site – Failure of Coordinator Sites:  If some active participating site does not contain a <ready T> record in its log, then the failed coordinator Ci cannot have decided to commit T. Can therefore abort T.  If none of the above cases holds, then all active sites must have a <ready T> record in their logs, but no additional control records (such as <abort T> of <commit T>). In this case active sites must wait for Ci to recover, to find decision.
  • 41. Distributed Databases - Transaction Processing  Three Phase Commit Protocol:  No network partitioning  At any point at least on site must be up  At most k sites can fail.
  • 42. Distributed Databases - Transaction Processing  Three Phase Commit Protocol:
  • 43. Distributed Databases - Transaction Processing  Three Phase Commit Protocol – Phase I:  Coordinator asks all participants to prepare to commit transaction Ti. The coordinator then makes the decision about commit or abort based on the response from all the participating sites.
  • 44. Distributed Databases - Transaction Processing  Three Phase Commit Protocol – Phase II:  Coordinator makes a decision as in 2Phase Commit which is called the pre-commit decision <Pre-commit, T>, and records it in multiple participating sites.
  • 45. Distributed Databases - Transaction Processing  Three Phase Commit Protocol – Phase III:  Coordinator sends commit/ abort message to all participating sites.
  • 46. Distributed Databases - Transaction Processing  Three Phase Commit Protocol:  If the coordinating site in case gets failed then one of the participating site becomes the coordinating site and consults other participating sites to know the Pre- commit message which they posses.  Thus using this pre-commit message the decision about commit/ abort is taken by this new coordinating site.
  • 47. Object based Database The object based database provide the solution to model the real world object and their behavior.  It is an alternative to relational database model.
  • 48. Object based Database Complex Data Types:  Address can be viewed as a single string or separate attributes for each part or composite attributes.  Applications:  Computer Aided Design Hypertext database Multimedia and image databases.
  • 49. Object based Database  Object Classes: class employee { /* Variables */ string name; string address; date start-date; int salary; /* Messages */ int annual-salary(); string get-name(); string get-address(); int set-address(string new-address); int employment-length(); };
  • 50. Object based Database  Inheritance:  An object-oriented database schema typically requires a large number of classes.  For example, bank employees are similar to customers.  Need to place classes in a specialization hierarchy
  • 52. Object based Database  Inheritance – Pseudo Code:
  • 53. Object based Database  Inheritance:  The keyword isa is used to indicate that a class is a specialization of another class. The specialization of a class are called subclasses.  E.g., employee is a subclass of person; teller is a subclass of employee. Conversely, employee is a superclass of teller.
  • 54. Object based Database  Inheritance:  Code Reusability  Substitutability: Any method of a class, A, can be equally well be invoked with an object belonging to any subclass B of A.
  • 55. Object based Database  Multiple Inheritance:  In most cases, tree-structured organization of classes is adequate to describe applications.  Multiple inheritance: the ability of class to inherit variables and methods from multiple superclasses.  The class/subclass relationship is represented by a rooted directed acyclic graph (DAG) in which a class may have more than one superclass.
  • 56. Object based Database  Multiple Inheritance:
  • 57. Object based Database  Multiple Inheritance:  Handling name conflicts: When multiple inheritance is used, there is potential ambiguity if the same variable or method can be inherited from more than one superclass.
  • 58. Object based Database  ODMG Object Model:  ODMG – Object Database Management Group  Come up with the specification for using object oriented database.  ODL – Object Definition Language  OQL – Object Query Language  OML – Object Manipulation Language
  • 59. Object based Database  ODL:  Declaring Classes:  keyword interface  The name of the class  The list of attributes of the class declared using keyword attribute.
  • 60. Object based Database  ODL:  Declaring Classes – Example:  interface Student{ attribute integer RollNo; attribute string Name; attribute string address; attribute string course_id; };
  • 61. Object based Database  ODL:  Declaring Relationships:  The SQL makes use of foreign key concept to establish relationships two tables.  Keyword relationship to declare the relationship among two relational schema.
  • 62. Object based Database  ODL:  Declaring Relationships:  interface Student{ attribute integer RollNo; attribute string Name; attribute string address; attribute string course_id; relationship Course Stud_Course_real; };
  • 63. Object based Database  ODL:  Declaring Key:  To identify the tuple in the relationship.  Use keyword key to make particular attribute a key.
  • 64. Object based Database  ODL:  Declaring Keys:  interface Student(Key RollNo){ attribute integer RollNo; attribute string Name; attribute string address; attribute string course_id; };
  • 65. Object based Database  OQL:  A query language standard for object oriented databases modeled after SQL.  Rules:  All complete statements must be terminated by a semi-colon  A list of entries in OQL is usually separated by commas but not terminated by a comma(,).  Strings of text are enclosed by matching quotation marks.
  • 66. Object based Database  OQL:  Basic from of OQL: Select, From and Where  Syntax: SELECT <list of values> FROM <list of collections and variable assignments> WHERE < condition> SELECT Sname:p.name FROM p in People WHERE p.age>30
  • 67. Object based Database  OQL: Dot notations and Path expressions:  ta.salary -> real  t.students -> set of tuples of type tuple(name, fee:real) representing students  t.salary -> real
  • 68. XML Databases  XML - Extensible Markup Language  XML tags identify the data and are used to store and organize the data.  Characteristics:  XML is extensible  XML carries the data, does not present it  XML is a public standard
  • 69. XML Databases  Syntax Rules for XML Declaration  The XML declaration is case sensitive and must begin with "<?xml>" where "xml" is written in lower-case.  If document contains XML declaration, then it strictly needs to be the first statement of the XML document.
  • 70. XML Databases  Element:  XML elements can be defined as building blocks of an XML.  Elements can behave as containers to hold text, elements, attributes, media objects or all of these.
  • 71. XML Databases  Element: <element-name attribute1 attribute2> . ...content </element-name>
  • 72. XML Databases  Empty Element:  An empty element (element with no content) has following syntax: <name attribute1 attribute2.../>
  • 73. XML Databases Element – Example: <?xml version = "1.0"?> <contact-info> <address category = "residence"> <name>XYZ</name> <company>ABC Companu</company> <phone>1234567890</phone> </address> </contact-info>
  • 74. XML Databases  Attributes:  Attribute gives more information about XML elements.  Attributes define properties of elements. An XML attribute is always a name-value pair. <element-name attribute1 attribute2 > ....content.. < /element-name>
  • 75. XML Databases  Attributes – Example: <garden> <plants category = "flowers" /> <plants category = "shrubs"> </plants> </garden>
  • 76. XML Databases  Types of XML Documents:  Data Centric XML documents: Many small data items that follow specific structure. These documents follow predefined schema that defines tag names.  Document Centric XML documents: Large amounts of text, such as articles of book. There are very few or no structured data elements in these documents.  Hybrid Documents: Unstructured data and may not have predefined schema.
  • 77. XML Databases  DTD  DTD – Document Type Definition  To define the basic building block of any xml document  Using DTD, specify various elements type, attributes and their relationships with one another.  To specify the set of rules for structuring data in any XML file
  • 78. XML Databases  DTD – Elements:  The basic entity  The elements are used for defining the tags.  The elements typically consist of opening and closing tag.  Ex: <body>some text</body>
  • 79. XML Databases  DTD – Attributes:  Attributes always come in name/value pairs.  To specify the values of the element.  These are specified within the double quotes.  Ex: <img src="computer.gif" />
  • 80. XML Databases  DTD – Entities:  Entities are expanded when a document is parsed by an XML parser. Entity References Character &lt; < &gt; > &amp; & &quot; "
  • 81. XML Databases  DTD – PCDATA:  Parsed Character Data.  PCDATA is text that WILL be parsed by a parser. The text will be examined by the parser for entities and markup.  Tags inside the text will be treated as markup and entities will be expanded.  &, <, or > - &amp; &lt; and &gt;
  • 82. XML Databases  DTD – CDATA:  Character Data.  CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.
  • 83. XML Databases  DTD – Example: <?xml version="1.0"?> <page> <title>Hello friend</title> <content>Here is some content :)</content> <comment>samples</comment> </page>
  • 84. XML Databases  DTD – Example:
  • 85. XML Databases  DTD – Merits:  To define the structural components of XML document  Simple and Compact
  • 86. XML Databases  DTD – Demerits:  It cannot be much specific for complex documents  The language that DTD uses is not an XML document.  The DTD cannot define the type of data contained with in the XML document.
  • 87. XML Databases  XML Schema:  Structure of an XML document.  The elements and attributes that can appear in a document  The number of (and order of) child elements  Data types for elements and attributes  Default and fixed values for elements and attributes  XML Schema is an XML-based (and more powerful) alternative to DTD
  • 88. XML Databases  XML Schema:  Example:  StudentSchema.xsd  MySchema.xml
  • 89. XML Databases  XML Schema – Advantages:  The schema provide the support for data types  The XML schema is written in XML itself and has large number of built in and derived types.  Disadvantages:  Complex to design and hard to learn  Maintaining the schema for large and complex operations sometimes slows down the processing ox XML document.
  • 90. XML Databases  Xquery:  To query the XML database, to get information out of XML databases.  XQuery FLWOR Expressions  For - selects a sequence of nodes  Let - binds a sequence to a variable  Where - filters the nodes  Order by - sorts the nodes  Return - what to return (gets evaluated once for every node)
  • 91. XML Databases  Xquery – Example:  courses. Xml  display the title elements of the courses whose fees are greater than 5000 for $x in doc("courses.xml")/courses/course where $x/fees>5000 return $x/title
  • 92. XML Databases  Xquery – Advantages:  Both hierarchical and tabular data can be retrieved.  To query tree and graphical structure.  Used to build web pages.  Used to transform XML documents.
  • 93. Information Retrieval  Information Retrieval: “The process of retrieving documents form a collection in response to a query submitted by a user”
  • 94. Information Retrieval  Information Retrieval:  Structured Data:  A form of data in which the information is in most organized form.  Ex: Student table
  • 95. Information Retrieval  Information Retrieval:  Unstructured Data:  Like human language.  It does not fit nicely into relational databases.  Ex: Emails, Text Documents, Social media, Videos and Images.
  • 96. Information Retrieval  Information Retrieval – Concept of Query  User can make use of free form of search request – Query  It is also called as keyword search.
  • 97. Information Retrieval  Characteristics of IR Systems:  Types of Users:  Expert User: User who is searching for specific information that is clear in mind.  Ex: User who wants to get the information about particular book.  Layperson: A user with generic information need.
  • 98. Information Retrieval  Characteristics of IR Systems:  Types of Data:  Search systems can be modified to specific types of data.
  • 99. Information Retrieval  Characteristics of IR Systems:  Types of Information Need:  Navigational Search: To find a particular piece of information that user needs quickly. Ex: Finding site of “Anna University”
  • 100. Information Retrieval  Characteristics of IR Systems:  Types of Information Need:  Informational Search: To find current information about some topic.  Example: Information about current News.
  • 101. Information Retrieval  Characteristics of IR Systems:  Types of Information Need:  Transactional Search: To reach a site in which further interaction happen.  Ex: Online Reservation.
  • 102. Information Retrieval Database System IR System Use of Structured data Use of unstructured data Relational Data model is used Free-form query model is used. Query returns data Search request returns list or pointers to documents that may contain the desired information Results are based on exact matching Results are based on approximate matching
  • 103. Information Retrieval  Modes of Interactions:  Retrieval: Extraction of relevant information from a repository of documents through an IR query.  Browsing: The activity of a user visiting or navigation through similar or related documents based on the user’s assessment of relevance.
  • 104. Information Retrieval  Modes of Interactions:  Hyperlinks: To interconnect web pages and are mainly used for browsing.  Anchor texts: Text phrases within documents used to label hyperlinks and are very relevant to browsing.  Web Search: combines both activities(retrieval and browsing)
  • 105. Information Retrieval  Modes of Interactions: Web Search Engine: Maintains an indexed repository of web pages. The most relevant web pages are returned to the user if possible in descending order of their relevance.
  • 106. Information Retrieval  IR Processing:  Statistical Approach:  The documents are first analyzed and broken down into chunks of text.  Each word is counted for its relevance.  These words are then compared against the query to test the significant degree of match.  Based on this matching, the ranked list of documents containing these words is presented to the user.
  • 107. Information Retrieval  IR Processing:  Statistical Approach:  Knowledge base technique of information retrieval is used.  The syntactical, lexical, sentential, discourse based and pragmatic level of words used to prepare knowledge base for understanding.
  • 109. Information Retrieval  Retrieval Models:  Boolean Model:  Documents represented as a set of terms  Form queries using standard Boolean logic set-theoretic operators - AND, OR and NOT.  Based on “Exact match” with query.  Lacks sophisticated ranking algorithms.  Make it easy to associate meta data information and write queries that match the contents of the documents
  • 110. Information Retrieval  Retrieval Models:  Vector Space Model:  An algebraic model for representing text documents.  It provides a framework in which weighting, ranking of retrieved documents and relevance feedback are possible.  similarity functions can be used = Cosine of the angle between the query and document vector commonly used
  • 111. Information Retrieval  Retrieval Models:  Probabilistic Model:  A More concrete and definitive approach is taken.  The IR system has to decide whether the documents belong to the relevant set or non-relevant set for a query.  To calculate the probability that the document belongs to the relevant set and compare that with the probability that the documents belongs to the non relevant set.
  • 112. Information Retrieval  Retrieval Models:  Semantic Model:  The process of matching documents to a given query is based on concept level and semantic matching instead of index term matching.  This allows retrieval of relevant documents that share meaningful associations with other documents in the query result.
  • 113. Information Retrieval  Retrieval Models:  Semantic Model – Level of Analysis:  Morphological Analysis: Analyzed noun, verbs, adjective.  Syntactical Analysis: Complete phrases in the document are parsed and then analyzed.  Semantic Analysis: To resolve the ambiguities in the words the synonyms are used
  • 114. Information Retrieval  Types of Queries in IR Systems:  Keywords:  Consist of words, phrases, and other characterizations of documents  Queries compared to set of index keywords  Allow use of Boolean and other operators to build a complex query
  • 115. Information Retrieval  Types of Queries in IR Systems:  Keywords:  Keywords implicitly connected by a logical AND operator  Remove stopwords - Most commonly occurring words: a, the, of  IR systems do not pay attention to the ordering of these words in the query
  • 116. Information Retrieval  Types of Queries in IR Systems:  Boolean Queries:  AND: both terms must be found OR: either term found NOT: record containing keyword omitted ( ): used for nesting +: equivalent to and –Boolean operators: equivalent to AND NOT Document retrieved if query logically true as exact match in do
  • 117. Information Retrieval  Types of Queries in IR Systems:  Phrase queries:  Phrase generally enclosed within double quotes  More restricted and specific version of proximity searching
  • 118. Information Retrieval  Types of Queries in IR Systems:  Proximity queries:  Accounts for how close within a record multiple terms should be to each other  Common option requires terms to be in the exact order  Various operator names: NEAR, ADJ(adjacent), or AFTER
  • 119. Information Retrieval  Types of Queries in IR Systems:  Wildcard queries:  Support regular expressions and pattern matching-based searching – ‘Data*’ would retrieve data, database, datapoint, dataset  Involves preprocessing overhead  Retrieval models do not directly provide support for this query type
  • 120. Information Retrieval  Types of Queries in IR Systems:  Natural Language queries:  Few natural language search engines Active area of research Easier to answer questions