1. Complete database
Database is collectionof datawhichisrelatedbysome aspect.Datais collectionof factsandfigures
whichcan be processedtoproduce information.Name of astudent,age, classandhersubjectscan be
countedas data forrecordingpurposes.
Mostlydata representsrecordablefacts.Dataaidsin producinginformationwhichisbasedonfacts.For
example,if we have dataaboutmarksobtainedbyall students,we canthenconclude abouttoppersand
average marksetc.
A database managementsystemstoresdata,insucha waywhichiseasiertoretrieve,manipulate and
helpstoproduce information.
Characteristics
Traditionallydatawasorganizedinfile formats.DBMSwas all new conceptsthenandall the research
was done tomake it to overcome all the deficienciesintraditional style of datamanagement.Modern
DBMS hasthe followingcharacteristics:
Real-worldentity:ModernDBMSare more realisticandusesreal worldentitiesto designits
architecture.Itusesthe behaviorandattributestoo.Forexample,aschool database mayuse studentas
entityandtheirage as theirattribute.
Relation-basedtables:DBMSallowsentitiesandrelationsamongthemtoformas tables.Thiseasesthe
conceptof data saving.A usercan understandthe architecture of database justbylookingattable
namesetc.
Isolationof dataand application:A database systemisentirelydifferentthanitsdata.Where database is
saidto active entity,datais saidto be passive one onwhichthe database worksandorganizes.DBMS
alsostoresmetadatawhichisdata about data,to ease itsown process.
Lessredundancy:DBMS followsrulesof normalization,whichsplitsarelationwhenanyof itsattributes
ishavingredundancyinvalues.Followingnormalization,whichitself isamathematicallyrichand
scientificprocess,make the entire database tocontainaslessredundancyaspossible.
Consistency:DBMSalwaysenjoythe state onconsistencywhere the previous formof datastoring
applicationslikefile processingdoesnotguarantee this.Consistencyisastate where everyrelationin
database remainsconsistent.There existmethodsandtechniques,whichcandetectattemptof leaving
database ininconsistentstate.
QueryLanguage:DBMS isequippedwithquerylanguage,whichmakesitmore efficienttoretrieve and
manipulate data.A usercan applyas manyand differentfilteringoptions,ashe orshe wants.
Traditionallyitwasnotpossible where file-processingsystemwasused.
ACIDProperties:DBMSfollowsthe conceptsforACIDproperties,whichstandsforAtomicity,
Consistency,IsolationandDurability.These conceptsare appliedontransactions,whichmanipulate
data indatabase.ACIDpropertiesmaintainsdatabase inhealthystate inmulti-transactional
environmentandincase of failure.
2. MultiuserandConcurrentAccess:DBMS supportmulti-userenvironmentandallowsthemtoaccessand
manipulate datainparallel.Thoughthere are restrictionsontransactionswhentheyattempttohandle
same data item,butusersare alwaysunaware of them.
Multiple views:DBMSoffersmultiplesviewsfordifferentusers.A userwhoisin salesdepartmentwill
have a differentviewof database thanapersonworkinginproductiondepartment.Thisenablesuserto
have a concentrate viewof database accordingtotheirrequirements
Security:Featureslikemultipleviewsofferssecurityatsome extentwhere usersare unable toaccess
data of otherusersanddepartments.DBMSoffersmethodstoimpose constraintswhileenteringdata
intodatabase and retrievingdataat laterstage.DBMS offersmanydifferentlevelsof securityfeatures,
whichenablesmultiple userstohave differentview withdifferentfeatures.Forexample,auserinsales
departmentcannotsee dataof purchase departmentisone thing,additionallyhow muchdataof sales
departmenthe cansee,can alsobe managed.Because DBMS isnot savedon diskas traditional file
systemitisveryhard for a thief tobreakthe cod.
Users
DBMS isusedbyvarioususersfor variouspurposes.Some mayinvolve inretrievingdataandsome may
involve inbackingitup.Some of themare describedasfollows:
Administrators:A bunchof users
maintainthe DBMS and are responsible for administratingthe database.Theyare responsible tolook
afteritsusage and by whomit shouldbe used.Theycreate usersaccessandapplylimitationtomaintain
isolationandforce security.AdministratorsalsolookafterDBMSresourceslike systemlicense,software
applicationandtoolsrequiredandotherhardware relatedmaintenance.
Designer:Thisisthe groupof people whoactuallyworksondesigningpartof database.The actual
database isstartedwithrequirementanalysisfollowedbyagood designingprocess.Theypeople keepa
close watchon what data shouldbe keptandinwhat format.Theyidentifyanddesignthe whole setof
entities,relations,constraintsandviews.
End Users:This groupcontainsthe personswhoactuallytake advantage of database system.Endusers
can be justviewerswhopayattentiontothe logsormarketrates or enduserscan be as sophisticatedas
businessanalystswhotakesthe mostof it.
DBMS - Architecture
The designof a Database ManagementSystemhighlydependsonits architecture.Itcanbe centralized
or decentralizedorhierarchical.DBMSarchitecture canbe seenas single tierormulti tier.n-tier
architecture dividesthe whole systemintorelatedbutindependentnmodules,whichcanbe
independentlymodified,altered,changedorreplaced.
In 1-tierarchitecture,DBMSis the onlyentitywhere userdirectlysitsonDBMS andusesit.Any changes
done here will directlybe done onDBMS itself.Itdoesnotprovide handytoolsforendusersand
preferablydatabase designerandprogrammersuse singletierarchitecture.
3. If the architecture of DBMS is 2-tierthenmusthave some application,whichusesthe DBMS.
Programmersuse 2-tierarchitecture where theyaccessDBMSby meansof application.Here application
tierisentirelyindependentof database intermof operation,designandprogramming.
3-tierarchitecture
Most widelyusedarchitectureis3-tierarchitecture.3-tierarchitecture separatesittierfromeachother
on basisof users.It isdescribedasfollows:
Database (Data) Tier:At thistier,onlydatabase resides.Database alongwithitsqueryprocessing
languagessitsinlayer-3of 3-tierarchitecture.Italsocontainsall relationsandtheirconstraints.
Application(Middle) Tier:Atthistierthe applicationserverandprogram, whichaccessdatabase,
resides.Fora userthisapplicationtierworksasabstractedview of database.Usersare unaware of any
existence of database beyondapplication.Fordatabase-tier,applicationtieristhe userof it.Database
tierisnot aware of any otheruserbeyondapplicationtier.Thistierworksasmediatorbetweenthe two.
User (Presentation) Tier:Anendusersitsonthistier.From a usersaspectthistieris everything.He/she
doesn'tknowaboutany existence orformof database beyondthislayer.Atthislayermultiple viewsof
database can be providedbythe application.All viewsare generatedbyapplications,whichresidesin
applicationtier.
Multiple tierdatabase architecture ishighlymodifiableasalmostall itscomponentsare independent
and can be changedindependently.
DBMS - Data Models
Data model tellshowthe logical structure of adatabase ismodeled.DataModelsare fundamental
entitiestointroduce abstractioninDBMS.Data modelsdefine how dataisconnectedtoeachotherand
howit will be processedandstoredinside the system.
The veryfirstdata model couldbe flatdata-modelswhereall the datausedtobe keptinsame plane.
Because earlierdatamodelswere notsoscientifictheywere prone tointroduce lotsof duplicationand
update anomalies.
Entity-RelationshipMode
Entity-Relationshipmodel isbasedonthe notionof real worldentitiesandrelationshipamongthem.
While formulatingreal-worldscenariointodatabase model,ERModel createsentityset,relationship
set,general attributesandconstraints.
ER Model is bestusedforthe conceptual designof database.
ER Model is basedon:
Entitiesandtheirattributes
4. Relationshipsamongentities
These conceptsare explainedbelow.
Entity
An entity inER Model isreal worldentity,whichhassome propertiescalledattributes.Everyattribute is
definedbyitssetof values,calleddomain.
For example,inaschool database,astudentisconsideredasanentity.Studenthasvariousattributes
like name,age andclassetc.
Relationship
The logical associationamongentitiesiscalledrelationship.Relationshipsare mappedwithentitiesin
variousways.Mappingcardinalitiesdefine the numberof associationbetweentwoentities.
Mappingcardinalities:
one to one
one to many
manyto one
manyto many
ER-Model isexplainedhere.
Relational Model
The most populardata model inDBMS isRelational Model.Itismore scientificmodel thenothers.This
model isbasedonfirst-orderpredicate logicanddefinestable asann-aryrelation.
The main
highlightsof thismodel are:
Data is storedintablescalledrelations.
5. Relationscanbe normalized.
In normalizedrelations,valuessavedare atomicvalues.
Each row inrelationcontainsunique value
Each columnin relationcontainsvaluesfromasame domain.
Relational Model isexplainedhere.
DBMS - Data Schemas
Database schema
Database schemaskeletonstructure of anditrepresentsthe logical view of entire database.Ittells
abouthow the data isorganized andhow relationamongthemisassociated.Itformulatesall database
constraintsthatwouldbe put ondata in relations,whichresidesindatabase.
A database schemadefinesitsentitiesandthe relationshipamongthem.Database schemaisa
descriptive detail of the database,whichcanbe depictedbymeansof schemadiagrams.All these
activitiesare done bydatabase designertohelpprogrammersinordertogive some ease of
understandingall aspectof database.
Database schemacan be dividedbroadlyin twocategories:
Physical Database Schema:Thisschemapertainstothe actual storage of data and itsformof storage
like files,indicesetc.Itdefinesthe howdatawill be storedinsecondarystorage etc.
Logical Database Schema:Thisdefinesall logical constraintsthatneedtobe appliedondatastored.It
definestables,viewsandintegrityconstraintsetc.
Database Instance
6. It isimportantthat we distinguishthese twotermsindividually.Database schemaisthe skeletonof
database.Itis designed whendatabase doesn'texistatall andveryhard to do anychangesonce the
database isoperational.Database schemadoesnotcontainanydata or information.
Database instances,isa state of operational database withdataatany giventime.Thisisa snapshotof
database.Database instancestendtochange withtime.DBMSensuresthatits everyinstance (state)
mustbe a validstate bykeepinguptoall validation,constraintsandconditionthatdatabase designers
has imposedoritis expectedfromDBMS itself.
DBMS - Data Independence
If the database systemisnotmulti-layeredthenitwill be veryhardtomake any changesinthe database
system.Database systemsare designedinmulti-layersaswe leantearlier.
Data Independence:
There'sa lot of data inwhole database managementsystemotherthanuser'sdata.DBMS comprisesof
three kindsof schemas,whichisinturn data aboutdata (Meta-Data).Meta-dataisalsostoredalong
withdatabase,whichonce storedisthenhardto modify.Butas DBMS expands,itneedstobe changed
overthe time satisfythe requirementsof users.Butif the whole datawere highlydependentitwould
become tediousandhighlycomplex.
Data about data itself isdividedin
layeredarchitecture sothatwhenwe change dataat one layeritdoesnot affectthe data layeredat
differentlevel.Thisdataisindependentbutmappedoneachother.
Logical Data Independence
Logical data is data aboutdatabase,thatis,it storesinformationabouthow dataismanagedinside.For
example,atable (relation) storedinthe database andall constraints,whichare appliedonthatrelation.
Logical data independenceisakindof mechanism, whichliberalizesitself fromactual datastoredon the
disk.If we do some changesontable formatit shouldnotchange the data residingondisk.
Physical DataIndependence
All schemasare logical andactual data isstoredinbit formaton the disk.Physical dataindependence is
the powerto change the physical datawithoutimpactingthe schemaorlogical data.
7. For example,incase we wanttochange or upgrade the storage systemitself,thatis,usingSSDinstead
of Hard-disksshouldnothave anyimpactonlogical data or schemas.
ER Model Basic Concepts
Entityrelationshipmodel definesthe conceptual view of database.Itworksaroundreal worldentityand
associationamongthem.Atviewlevel,ERmodel isconsideredwell fordesigningdatabases.
Entity
A real-worldthingeitheranimate orinanimate thatcanbe easilyidentifiable anddistinguishable.For
example,inaschool database,student,teachers,classandcourse offeredcanbe consideredasentities.
All entitieshave some attributesorpropertiesthatgive themtheiridentity.
An entitysetisa collectionof similartypesof entities.Entitysetmaycontainentitieswithattribute
sharingsimilarvalues.Forexample,Studentssetmaycontainall the studentof a school;likewise
Teacherssetmay containall the teachersof school from all faculties.Entitiessetsneednottobe
disjoint.
Attributes
Entitiesare representedbymeansof theirproperties,calledattributes.Allattributeshave values.For
example,astudententitymayhave name,class,age asattributes.
There existsadomain or range of valuesthatcan be assignedtoattributes. Forexample,astudent's
name cannot be a numericvalue.Ithas to be alphabetic.A student'sage cannotbe negative,etc.
Typesof attributes:
Simple attribute:
Simple attributesare atomicvalues,whichcannotbe dividedfurther.Forexample,student'sphone-
numberisan atomicvalue of 10 digits.
Composite attribute:
Composite attributesare made of more than one simple attribute.Forexample,astudent'scomplete
name may have first_name andlast_name.
Derivedattribute:
Derivedattributesare attributes,whichdonotexistphysical inthe database,butthere valuesare
derivedfromotherattributespresentedinthe database.Forexample,average_salaryinadepartment
shouldbe savedindatabase insteaditcanbe derived. Foranotherexample,age canbe derivedfrom
data_of_birth.
Single-valuedattribute:
Single valuedattributescontainonsinglevalue.Forexample:Social_Security_Number.
Multi-value attribute:
8. Multi-value attribute maycontainmore thanone values. Forexample,apersoncanhave more thanone
phone numbers,email_addressesetc.
These attribute typescancome togetherina waylike:
simple single-valuedattributes
simple multi-valuedattributes
composite single-valuedattributes
composite multi-valuedattributes
Entity-setand Keys
Keyisan attribute or collectionof attributesthatuniquelyidentifiesanentityamongentityset.
For example,roll_numberof astudentmakesher/himidentifiable amongstudents.
Super Key:Setof attributes(one ormore) thatcollectivelyidentifiesanentityinanentityset.
Candidate Key: Minimal superkeyiscalledcandidate keythatis,superskeysforwhichnopropersubset
are a superkey.Anentitysetmayhave more thanone candidate key.
Primary Key: Thisis one of the candidate keychosenbythe database designertouniquelyidentifythe
entityset.
Relationship
The associationamongentitiesiscalledrelationship.Forexample,employee entityhasrelation worksat
withdepartment.Anotherexample isforstudentwhoenrollsinsome course.Here, Worksatand
Enrollsare calledrelationship.
RelationshipSet:
Relationshipof similartype iscalledrelationshipset.Like entities,arelationshiptoocanhave attributes.
These attributesare calleddescriptiveattributes.
Degree of relationship
The numberof participatingentitiesinanrelationshipdefinesthe degreeof the relationship.
Binary= degree 2
Ternary= degree 3
n-ary= degree
MappingCardinalities:
Cardinalitydefinesthe numberof entitiesinone entitysetwhichcanbe associatedtothe numberof
entitiesof othersetviarelationshipset.
9. One-to-one:one entityfromentitysetA can be associatedwithatmost one entityof entitysetBand
vice versa
One-to-many:One entityfrom entitysetA canbe associatedwithmore thanone entitiesof entitysetB
but fromentitysetB one entitycanbe associatedwithatmostone entity.
Many-to-one:More than one entitiesfromentity
setA can be associatedwithatmostone entityof entitysetB butone entityfromentitysetB can be
associatedwithmore thanone entityfromentitysetA.
Many-to-many:one entityfromA can be
10. associatedwithmore thanone entityfromB andvice versa
ER Diagram Representation
Nowwe shall learnhowER Model is representedbymeansof ER diagram.Everyobjectlike entity
attributesof an entity,relationshipset,andattributesof relationshipsetcanbe representedbytoolsof
ER diagram.
Entity
Entitiesare representedbymeansof rectangles.Rectanglesare namedwiththe entitysetthey
represent. Attributes
Attributesare propertiesof entities.Attributesare representedbymeansof eclipses.Everyeclipse
representsone attribute andisdirectlyconnectedtoitsentity(rectangle).
If the attributesare composite,
theyare furtherdividedinatree like structure.Everynode isthenconnectedtoitsattribute.Thatis
composite attributesare representedbyeclipsesthatare connectedwithaneclipse
12. Relationship
Relationshipsare representedbydiamondshapedbox.Nameof the relationshipiswritteninthe
diamond-box.All entities(rectangles),participatinginrelationship,are connectedtoitbya line.
Binaryrelationshipandcardinality
A relationshipwheretwoentitiesare participating,iscalledabinaryrelationship.Cardinalityisthe
numberof instance of an entityfroma relationthatcan be associatedwiththe relation.
One-to-one
Whenonlyone instance of entityisassociatedwiththe relationship,itismarkedas'1'. This image below
reflectsthatonly1 instance of each entityshouldbe associatedwiththe relationship.Itdepictsone-to-
one relationship
One-to-many
Whenmore than one instance of entityisassociatedwiththe relationship,itismarkedas'N'. Thisimage
belowreflectsthatonly1 instance of entityonthe leftandmore than one instance of entityonthe right
can be associatedwiththe relationship.Itdepictsone-to-manyrelationship
Many-to-
one
Whenmore than one instance of entityisassociatedwiththe relationship,itismarkedas'N'. Thisimage
belowreflectsthatmore thanone instance of entityonthe leftand onlyone instance of entityonthe
rightcan be associatedwiththe relationship.Itdepictsmany-to-onerelationship
Participation Constraints
13. Total Participation: Each entityinthe entityisinvolvedinthe relationship.Total participationis
representedbydouble lines.
Partial participation: Notall entitiesare involvedinthe relationship.Partial participationisrepresented
by single line.
Generalization,Aggregation
ER Model has the powerof expressingdatabase entitiesinconceptual hierarchical mannersuchthat, as
the hierarchical goesupitgeneralize the view of entitiesandaswe go deepinthe hierarchyitgivesus
detail of everyentityincluded.
Goingup inthisstructure iscalledgeneralization,where entitiesare clubbedtogethertorepresenta
more generalizedview.Forexample,aparticularstudentnamed,Miracan be generalizedalongwithall
the students,the entityshall be student,andfurtherastudentisperson.The reverse iscalled
specializationwhere apersonisstudent,andthatstudentisMira.
Generalization
As mentionedabove,the processof generalizingentities,wherethe generalizedentitiescontainthe
propertiesof all the generalizedentitiesiscalledGeneralization.Ingeneralization,anumberof entities
are broughttogetherintoone generalizedentitybasedontheirsimilarcharacteristics.Foranexample,
pigeon,house sparrow,crowanddove all can be generalizedasBirds.
Specialization
Specializationisaprocess,whichisopposite togeneralization,asmentionedabove. Inspecialization,a
groupof entitiesisdividedintosub-groupsbasedontheircharacteristics.Take agroupPersonfor
example.A personhasname,date of birth,genderetc.These propertiesare commoninall persons,
humanbeings.Butina company,a personcan be identifiedasemployee,employer,customerorvendor
basedon whatrole do theyplayincompany.
14. Similarly,inaschool database,apersoncan be specializedasteacher,studentorstaff;basedonwhat
role do theyplayinschool as entities
Inheritance
For example,attributesof apersonlike name,age,andgendercanbe inheritedbylowerlevel entities
like studentandteacheretc.
DBMS Codd'sRules
Dr Edgar F.Codd didsome extensive researchinRelational Model of database systemsand came up
withtwelve rulesof hisownwhichaccordingtohim, a database mustobeyinorderto be a true
relational database.
These rulescanbe appliedonadatabase systemthatis capable of managingisstoreddata usingonlyits
relational capabilities.Thisisafoundationrule,whichprovidesabase toimplyotherrulesonit.
Rule 1: Informationrule
15. Thisrule statesthat all information(data),whichisstoredinthe database,mustbe a value of some
table cell.Everythinginadatabase must be storedintable formats.Thisinformationcanbe userdata or
meta-data.
Rule 2: GuaranteedAccessrule
Thisrule statesthat everysingle dataelement(value) isguaranteedtobe accessiblelogicallywith
combinationof table-name,primary-key(row value) andattribute-name (columnvalue).Noother
means,suchas pointers,canbe usedto access data.
Rule 3: SystematicTreatmentof NULL values
Thisrule statesthe NULL valuesinthe database mustbe givena systematictreatment.AsaNULL may
have several meanings,i.e.NULLcanbe interpretedasone the following:dataismissing,dataisnot
known,datais notapplicable etc.
Rule 4: Active online catalog
Thisrule statesthat the structure descriptionof whole database mustbe storedinanonline catalog,i.e.
data dictionary,whichcanbe accessedbythe authorizedusers.Userscanuse the same querylanguage
to access the catalogwhichtheyuse to access the database itself.
Rule 5: Comprehensive datasub-language rule
Thisrule statesthat a database musthave a supportfora language whichhaslinearsyntax whichis
capable of data definition,datamanipulationandtransactionmanagementoperations.Database canbe
accessedbymeansof thislanguage only,eitherdirectlyorbymeansof some application.If the
database can be accessedor manipulatedinsome waywithoutanyhelpof thislanguage,itisthena
violation.
Rule 6: Viewupdatingrule
Thisrule statesthat all viewsof database,whichcantheoreticallybe updated,mustalsobe updatable
by the system.
Rule 7: High-levelinsert,update anddelete rule
Thisrule statesthe database mustemploysupporthigh-level insertion,updationanddeletion.Thismust
not be limitedtoasingle rowthat is,itmust also supportunion,intersectionandminusoperationsto
yieldsetsof datarecords.
Rule 8: Physical dataindependence
Thisrule statesthat the applicationshouldnothave anyconcernabouthow the data isphysically
stored.Also,anychange initsphysical structure mustnot have anyimpact onapplication.
Rule 9: Logical data independence
Thisrule statesthat the logical datamust be independentof itsuser’sview (application).Anychange in
logical datamust notimplyanychange in the applicationusing it.Forexample,if twotablesare merged
or one is splitintotwodifferenttables,there shouldbe noimpactthe change on userapplication.Thisis
one of the mostdifficultrule toapply.
Rule 10: Integrityindependence
16. Thisrule statesthat the database mustbe independentof the applicationusingit.All itsintegrity
constraintscan be independentlymodifiedwithoutthe needof anychange inthe application.Thisrule
makesdatabase independentof the front-endapplicationanditsinterface.
Rule 11: Distributionindependence
Thisrule statesthat the endusermust notbe able to see thatthe data is distributedovervarious
locations.Usermustalsosee that data islocatedat one site only.Thisrule hasbeenprovenasa
foundationof distributeddatabase systems.
Rule 12: Non-subversionrule
Thisrule statesthat if a systemhas an interface thatprovidesaccesstolow level records,thisinterface
thenmustnot be able to subvertthe systemandbypasssecurityandintegrityconstraints.
Relational DataModel
Relational datamodel isthe primarydatamodel,whichisusedwidelyaroundthe worldfordatastorage
and processing.Thismodel issimple andhave all the propertiesandcapabilitiesrequiredtoprocess
data withstorage efficiency.
Concepts
Tables:Inrelationdatamodel,relationsare savedinthe formatof Tables.Thisformatstoresthe
relationamongentities.A table hasrowsandcolumns,where rowsrepresentrecordsandcolumns
representsthe attributes.
Tuple:A single rowof a table,whichcontainsasingle recordforthatrelationiscalledatuple.
Relationinstance:A finitesetof tuplesinthe relational database systemrepresentsrelationinstance.
Relationinstancesdonothave duplicate tuples.
Relationschema:Thisdescribesthe relationname (table name),attributesandtheirnames.
Relationkey:Eachrow hasone or more attributeswhichcanidentifythe row inthe relation(table)
uniquely,iscalledthe relationkey.
Attribute domain:Everyattribute hassome pre-definedvaluescope,knownasattribute domain.
Constraints
Everyrelationhassome conditionsthatmustholdforit to be a validrelation.These conditionsare
calledRelationalIntegrityConstraints.There are three mainintegrityconstraints.
Key Constraints
Domainconstraints
Referential integrityconstraints
KeyConstraints:
17. There mustbe at leastone minimal subsetof attributesinthe relation,whichcanidentifyatuple
uniquely.Thisminimalsubsetof attributesiscalledkeyforthatrelation.If there are more thanone such
minimal subsets,these are calledcandidate keys.
Keyconstraintsforcesthat:
ina relationwithakeyattribute,notwotuplescanhave identical value forkeyattributes.
keyattribute cannot have NULL values.
Keyconstrainsare alsoreferredtoas EntityConstraints.
Domainconstraints
Attributeshave specificvaluesinreal-worldscenario.Forexample,age canonlybe positive integer.The
same constraintshasbeentried toemployonthe attributesof a relation.Everyattribute isboundto
have a specificrange of values.Forexample,age cannot be lessthanzero andtelephone numbercan
not be a outside 0-9.
Referential integrityconstraints
Thisintegrityconstraints worksonthe conceptof ForeignKey.A keyattribute of a relationcanbe
referredinotherrelation,where itiscalledforeignkey.
Referential integrityconstraintstatesthatif a relationreferstoankeyattribute of a differentorsame
relation,thatkeyelementmustexists.
Relational Algebra
Relational database systemsare expectedtobe equippedbyaquerylanguage thatcan assistitsuser to
querythe database instances.Thiswayitsuserempowersitself andcanpopulate the resultsas
required.There are twokindsof querylanguages,relationalalgebraandrelational calculus.
Relational algebra
Relational algebraisa procedural querylanguage,whichtakesinstancesof relationsasinputandyields
instancesof relationsasoutput.Itusesoperatorstoperformqueries.Anoperatorcanbe eitherunary
or binary.Theyaccept relationsastheirinputandyieldsrelationsastheiroutput.Relational algebrais
performedrecursivelyonarelationandintermediateresultsare alsoconsideredrelations.
Fundamental operationsof Relational algebra:
Select
Project
Union
Setdifferent
Cartesianproduct
18. Rename
These are definedbrieflyasfollows:
SelectOperation(σ)
Selectstuplesthatsatisfythe givenpredicate fromarelation.
Notationσp(r)
Where p standsfor selectionpredicate andrstandsfor relation.pisprepositional logicformulaewhich
may use connectorslike and,orand not.These termsmay use relational operatorslike:=,≠, ≥, < , >, ≤.
For example:
σsubject="database"(Books)
Output: Selectstuplesfrombookswhere subjectis'database'.
σsubject="database"andprice="450"(Books)
Output: Selectstuplesfrombookswhere subjectis'database'and'price'is450.
σsubject="database"andprice <"450" or year> "2010"(Books)
Output: Selectstuplesfrombookswhere subjectis'database'and'price'is450 or the publicationyear
isgreaterthan 2010, that ispublishedafter2010.
ProjectOperation(∏)
Projectscolumn(s) thatsatisfygivenpredicate.
Notation:∏A1, A2, An(r)
Where a1, a2 , an are attribute namesof relationr.
Duplicate rowsare automaticallyeliminated,asrelationisaset.
for example:
∏subject,author(Books)
Selectsandprojectscolumnsnamedassubjectandauthorfrom relationBooks.
UnionOperation(∪)
Unionoperationperformsbinaryunionbetweentwogivenrelationsandisdefinedas:
r ∪ s = { t | t ∈ r or t ∈ s}
19. Notion:r U s
Where r and s are eitherdatabase relationsorrelationresultset(temporaryrelation).
For a unionoperationtobe valid,the followingconditionsmusthold:
r, s must have same numberof attributes.
Attribute domainsmustbe compatible.
Duplicate tuplesare automaticallyeliminated.
∏ author (Books) ∪∏ author (Articles)
Output: Projectsthe name of authorwhohas eitherwrittenabookor an article or both.
SetDifference ( −)
The resultof set difference operationistupleswhichpresentinone relationbutare notinthe second
relation.
Notation:r − s
Findsall tuplesthatare presentinr but not s.
∏ author (Books) − ∏ author(Articles)
Output:Resultsthe name of authorswhohas writtenbooksbutnot articles.
CartesianProduct(Χ)
Combinesinformationof twodifferentrelationsintoone.
Notation:r Χs
Where r and s are relationsandthere outputwill be definedas:
r Χ s = { q t | q ∈ r and t ∈ s}
∏ author = 'tutorialspoint'(BooksΧArticles)
Output: yieldsarelationasresultwhichshowsall booksandarticleswrittenbytutorialspoint.
Rename operation( ρ )
Resultsof relational algebraare alsorelationsbutwithoutanyname.The rename operationallowsusto
rename the outputrelation.rename operationisdenotedwithsmallgreekletterrhoρ
Notation:ρ x (E)
Where the resultof expressionEissavedwithname of x.
20. Additional operationsare:
Setintersection
Assignment
Natural join
Relational Calculus
In contrastwithRelational Algebra,RelationalCalculusisnon-procedural querylanguage,thatis,ittells
whatto do but neverexplainsthe way,how todoit.
Relational calculusexistsintwoforms:
Tuple relational calculus(TRC)
Filteringvariable rangesovertuples
Notation:{ T | Condition}
Returnsall tuplesTthat satisfiescondition.
For Example:
{ T.name | Author(T) ANDT.article = 'database'}
Output:returnstupleswith'name'fromAuthorwhohas writtenarticle on'database'.
TRC can be quantifiedalso.We canuse Existential ( ∃)andUniversal Quantifiers( ∀).
For example:
{ R| ∃T ∈ Authors(T.article='database'ANDR.name=T.name)}
Output: the querywill yieldthe same resultasthe previousone.
Domainrelational calculus(DRC)
In DRC the filteringvariable usesdomainof attributesinsteadof entiretuple values(asdone inTRC,
mentioned above).
Notation:
{ a1, a2, a3, ...,an | P (a1, a2, a3, ... ,an)}
where a1, a2 are attributesandP standsfor formulae builtbyinnerattributes.
For example:
21. {< article,page,subject>| ∈ TutorialsPoint∧subject='database'}
Output:YieldsArticle,Page andSubjectfromrelationTutorialsPointwhere Subjectisdatabase.
Justlike TRC,DRC also can be writtenusingexistential anduniversal quantifiers.DRCalsoinvolves
relational operators.
Expressionpowerof Tuple relationcalculusandDomainrelationcalculusisequivalenttoRelational
Algebra.
ER to Relational Model
ER Model whenconceptualizedintodiagramsgivesagoodoverview of entity-relationship,whichis
easiertounderstand.ERdiagramscan be mappedtoRelational schemathatis,itispossible tocreate
relational schemausingERdiagram.Thoughwe cannotimportall the ER constraintsintoRelational
model butan approximate schemacanbe generated.
There are more than one processesandalgorithmsavailabletoconvertER DiagramsintoRelational
Schema.Some of themare automatedandsome of themare manual process.We may focushere on
the mappingdiagramcontentstorelational basics.
ER Diagrams mainlycomprisedof:
Entityand itsattributes
Relationship,whichisassociationamongentities.
MappingEntity
An entityisareal worldobjectwithsome attributes.
MappingProcess(Algorithm):
Create table foreach entity
Entity'sattributesshouldbecome fieldsof tableswiththeirrespectivedatatypes.
Declare primarykey
Mappingrelationship
A relationshipisassociationamongentities.
Mappingprocess(Algorithm):
22. We use all above featuresof ER-Model,inordertocreate classesof objectsinobjectoriented
programming.Thismakesiteasierfor the programmerto concentrate onwhatshe is programming.
Detailsof entitiesare generallyhiddenfromthe user,thisprocessknownasabstraction.
One of the importantfeaturesof GeneralizationandSpecialization,isinheritance,thatis,the attributes
of higher-levelentitiesare inheritedbythe lowerlevelentities.
Create table fora relationship
Addthe primarykeysof all participatingEntitiesasfieldsof table withtheirrespectivedatatypes.
If relationshiphasanyattribute,addeachattribute asfieldof table.
Declare a primarykeycomposingall the primarykeysof participatingentities.
Declare all foreignkeyconstraints.
MappingWeak EntitySets
A weakentitysetsisone whichdoesnothave anyprimarykeyassociatedwithit.
Mappingprocess(Algorithm):
Create table
for weakentityset
Addall its attributestotable asfield
Addthe primarykeyof identifyingentityset
Declare all foreignkeyconstraints
Mappinghierarchical entities
23. ER specializationorgeneralization comesinthe formof hierarchical entitysets.
Mappingprocess(Algorithm):
Create tablesforall higherlevel entities
Create tablesforlowerlevelentities
Addprimarykeysof higherlevel entitiesinthe table of lowerlevel entities
In lowerleveltables,addall otherattributesof lowerentities.
Declare primarykeyof higherlevel tablethe primarykeyforlowerlevel table
Declare foreignkeyconstraints.
SQL Overview
SQL is a programminglanguage forRelationalDatabases.Itisdesignedoverrelational algebraandtuple
relational calculus.SQLcomesasa package withall major distributionsof RDBMS.
SQL comprisesbothdatadefinitionanddatamanipulationlanguages.Usingthe datadefinition
propertiesof SQL,one can designandmodify database schemawhereasdatamanipulationproperties
allowsSQLto store and retrieve datafromdatabase.
Data definitionLanguage
SQL usesthe followingsetof commandstodefine database schema:
CREATE
24. Createsnewdatabases,tablesandviewsfromRDBMS
For example:
Create database tutorialspoint;
Create table article;
Create viewfor_students;
DROP
Drop commandsdeletesviews,tablesanddatabasesfromRDBMS
Drop object_type object_name;
Drop database tutorialspoint;
Drop table article;
Drop viewfor_students;
ALTER
Modifiesdatabase schema.
Alterobject_type object_name parameters;
for example:
Altertable article addsubjectvarchar;
This commandaddsan attribute inrelationarticle withname subjectof stringtype.
Data ManipulationLanguage
SQL is equippedwithdatamanipulationlanguage.DMLmodifiesthe database instance byinserting,
updatinganddeletingitsdata.DML is responsible forall datamodificationindatabases.SQLcontains
the followingsetof commandinDML section:
SELECT/FROM/WHERE
INSERT INTO/VALUES
UPDATE/SET/WHERE
DELETE FROM/WHERE
These basicconstructsallowsdatabase programmersanduserstoenterdata and informationintothe
database and retrieve efficiently usinganumberof filteroptions.
SELECT/FROM/WHERE
SELECT
Thisis one of the fundamental querycommandof SQL.It issimilartoprojectionoperationof relational
algebra.Itselectsthe attributesbasedonthe conditiondescribedbyWHEREclause.
FROM
25. Thisclause takesa relationname asan argumentfromwhichattributesare to be selected/projected.In
case more thanone relationnamesare giventhisclause correspondstocartesianproduct.
WHERE
Thisclause definespredicate orconditionswhichmustmatchinorderto qualifythe attributestobe
projected.
For example:
Selectauthor_name
From book_author
Where age > 50;
Thiscommandwill projectnamesof author’sfrombook_authorrelationwhose age isgreaterthan50.
INSERT INTO/VALUES
Thiscommandis usedforinsertingvaluesintorowsof table (relation).
Syntax is
INSERT INTOtable (column1[,column2,column3...]) VALUES (value1[,value2,value3...])
Or
INSERT INTOtable VALUES(value1,[value2,...])
For Example:
INSERT INTOtutorialspoint(Author,Subject)VALUES("anonymous","computers");
UPDATE/SET/WHERE
Thiscommandis usedforupdatingor modifyingvaluesof columnsof table (relation).
Syntax is
UPDATE table_name SETcolumn_name = value [,column_name =value ...] [WHERE condition]
For example:
UPDATE tutorialspointSETAuthor="webmaster"WHEREAuthor="anonymous";
DELETE/FROM/WHERE
Thiscommandis usedforremovingone ormore rowsfromtable (relation).
Syntax is
DELETE FROMtable_name [WHEREcondition];
For example:
DELETE FROMtutorialspoints
WHERE Author="unknown";
For in-depthandpractical knowledge of SQL,clickhere.
26. Database Normalization
Functional Dependency
Functional dependency(FD) issetof constraintsbetweentwoattributesinarelation.Functional
dependencysaysthatif two tupleshave same valuesforattributesA1,A2,...,Anthenthose twotuples
musthave to have same valuesforattributesB1, B2, ...,Bn.
Functional dependencyisrepresentedbyarrow sign(→),thatisX→Y,where X functionallydetermines
Y. The lefthandside attributesdeterminesthe valuesof attributesatrighthandside.
Armstrong'sAxioms
If F issetof functional dependenciesthenthe closureof F,denotedasF+,is the setof all functional
dependencieslogicallyimpliedbyF.Armstrong'sAxiomsare setof rules,whenappliedrepeatedly
generatesclosure of functional dependencies.
Reflexiverule:If alphaisa setof attributesandbetais_subset_of alpha,thenalphaholdsbeta.
Augmentationrule:if a→ b holdsandy isattribute set,thenay → by alsoholds.Thatis adding
attributesindependencies,doesnotchange the basicdependencies.
Transitivityrule:Same astransitive ruleinalgebra,if a→ b holdsand b → c holdsthena → c alsohold.a
→ b iscalledasa functionallydeterminesb.
Trivial Functional Dependency
Trivial:If an FD X → Y holdswhere Ysubsetof X, thenitis calledatrivial FD.Trivial FDsare alwayshold.
Non-trivial:If anFD X → Y holdswhere Yis notsubsetof X, thenit is callednon-trivial FD.
Completelynon-trivial:If anFD X→ Y holdswhere x intersectY= Φ, is saidto be completelynon-trivial
FD.
Normalization
If a database designisnotperfectitmaycontainanomalies,whichare like abaddreamfor database
itself.Managingadatabase withanomaliesisnexttoimpossible.
Update anomalies:if dataitemsare scatteredandare not linkedtoeachotherproperly,thenthere may
be instanceswhenwe tryto update one data itemthathas copiesof it scatteredatseveral places,few
instancesof itget updatedproperlywhile few are leftwiththere oldvalues.Thisleavesdatabaseinan
inconsistentstate.
Deletionanomalies:we triedtodeletearecord,butparts of itleftundeletedbecauseof unawareness,
the data is alsosavedsomewhereelse.
Insertanomalies:we triedtoinsertdataina record thatdoesnot existatall.
Normalizationisamethodtoremove all these anomaliesandbringdatabase toconsistentstate and
free fromany kindsof anomalies.
FirstNormal Form:
27. Thisis definedinthe definitionof relations(tables) itself.Thisrule definesthatall the attributesina
relationmusthave atomicdomains.Valuesinatomicdomainare indivisibleunits.
[Image:Unorganizedrelation]
We re-arrange the relation(table) asbelow,toconvertitto FirstNormal Form
[Image:Relationin1NF]
Each attribute mustcontainonlysingle value fromitspre-defineddomain.
SecondNormal Form:
Before we learnaboutsecondnormal form, we needtounderstandthe following:
Prime attribute:anattribute,whichispartof prime-key,isprime attribute.
Non-prime attribute:anattribute,whichisnotapart of prime-key,issaidtobe a non-prime attribute.
Secondnormal formsays, that everynon-prime attribute shouldbe fullyfunctionallydependenton
prime keyattribute.Thatis,if X → A holds,thenthere shouldnotbe anypropersubsetY of X, for that Y
→ A also holds.
[Image:Relationnotin2NF]
We see here inStudent_Projectrelationthatthe prime keyattributesare Stu_IDandProj_ID.According
to the rule,non-keyattributes,i.e.Stu_Name andProj_Name mustbe dependentuponbothandnoton
any of the prime keyattribute individually.Butwe findthatStu_Name can be identifiedbyStu_IDand
Proj_Name canbe identifiedbyProj_IDindependently.Thisiscalledpartial dependency,whichisnot
allowedinSecondNormal Form.
[Image:Relationin2NF]
We broke the relationintwoas depictedinthe above picture.Sothere existsnopartial dependency.
ThirdNormal Form:
For a relationtobe in ThirdNormal Form, it mustbe inSecondNormal formand the followingmust
satisfy:
No non-prime attribute istransitivelydependentonprime keyattribute
For any non-trivialfunctional dependency,X→ A,theneither
X isa superkeyor,
A isprime attribute.
[Image:Relationnotin3NF]
We findthatin above depictedStudent_detail relation,Stu_IDiskeyandonlyprime keyattribute.We
findthatCity can be identifiedbyStu_ID aswell asZipitself.NeitherZipisasuperkeynorCityisa prime
attribute.Additionally,Stu_ID→ Zip→ City,sothere existstransitive dependency.
[Image:Relationin3NF]
28. We broke the relationasabove depictedtworelationstobringitinto3NF.
Boyce-CoddNormal Form:
BCNFis an extensionof ThirdNormal Forminstrict way.BCNFstatesthat
For any non-trivialfunctional dependency,X→ A,thenX mustbe a super-key.
In the above depictedpicture,Stu_IDissuper-keyinStudent_Detail relationandZipissuper-keyin
ZipCodesrelation.So,
Stu_ID → Stu_Name,Zip
And
Zip→ City
Confirms,thatbothrelationsare inBCNF.
Database Joins
We understandthe benefitsof Cartesianproductof tworelation,whichgivesusall the possible tuples
that are pairedtogether.ButCartesianproductmightnotbe feasibleforhuge relationswhere number
of tuplesare inthousandsandthe attributesof bothrelationsare considerable large.
Joiniscombinationof Cartesianproductfollowedbyselectionprocess.Joinoperationpairstwotuples
fromdifferentrelationsif andonlyif the givenjoinconditionissatisfied.
Followingsectionshoulddescribe brieflyaboutjointypes:
Theta(θ) join
θ in Thetajoinisthe joincondition.Thetajoinscombinestuplesfromdifferentrelationsprovidedthey
satisfythe thetacondition.
Notation:
R1 ⋈θ R2
R1 andR2 are relationswiththeirattributes(A1,A2,..,An) and (B1, B2,..,Bn) suchthat no attribute
matchesthat isR1 ∩ R2 = Φ Here θ is conditioninformof setof conditionsC.
Thetajoincan use all kindsof comparisonoperators.
StudentSID Name Std
101 Alex 10
102 Maria 11
[Table:StudentRelation]
SubjectsClass Subject
10 Math
10 English
29. 11 Music
11 Sports
[Table:SubjectsRelation]
Student_Detail =
STUDENT ⋈Student.Std= Subject.ClassSUBJECT
Student_detailSID Name Std Class Subject
101 Alex 10 10 Math
101 Alex 10 10 English
102 Maria 11 11 Music
102 Maria 11 11 Sports
[Table:Outputof thetajoin]
Equi-Join
WhenTheta joinusesonlyequalitycomparisonoperatoritissaidto be Equi-Join.The above example
conrrespondstoequi-join
Natural Join( ⋈ )
Natural joindoesnotuse any comparisonoperator.Itdoesnotconcatenate the wayCartesianproduct
does.Instead,Natural Joincanonlybe performedif the there isatleastone commonattribute exists
betweenrelation.Those attributesmusthave same name anddomain.
Natural joinacts on those matchingattributeswherethe valuesof attributesinbothrelationissame.
CoursesCID Course Dept
CS01 Database CS
ME01 Mechanics ME
EE01 Electronics EE
[Table:RelationCourses]
HoDDept Head
CS Alex
ME Maya
EE Mira
[Table:RelationHoD]
Courses⋈ HoDDept CID Course Head
CS CS01 Database Alex
ME ME01 Mechanics Maya
EE EE01 Electronics Mira
[Table:RelationCourses ⋈HoD]
OuterJoins
30. All joinsmentionedabove,thatisThetaJoin,Equi JoinandNatural Joinare calledinner-joins.Aninner-
joinprocessincludesonlytupleswithmatchingattributes,restare discarded inresultingrelation.There
existsmethodsbywhichall tuplesof anyrelationare includedinthe resultingrelation.
There are three kindsof outerjoins:
Leftouterjoin( R S )
All tuplesof Leftrelation,R,are includedinthe resultingrelationandif there existstuplesinRwithout
any matchingtuple inSthenthe S-attributesof resultingrelationare made NULL.
LeftA B
100 Database
101 Mechanics
102 Electronics
[Table:LeftRelation]
RightA B
100 Alex
102 Maya
104 Mira
[Table:RightRelation]
Courses HoDA B C D
100 Database 100 Alex
101 Mechanics --- ---
102 Electronics 102 Maya
[Table:Leftouterjoinoutput]
Rightouterjoin:( R S )
All tuplesof the Rightrelation,S,are includedinthe resultingrelationandif there existstuplesinS
withoutanymatchingtuple inR thenthe R-attributesof resultingrelationare made NULL.
Courses HoDA B C D
100 Database 100 Alex
102 Electronics 102 Maya
--- --- 104 Mira
[Table:Rightouterjoinoutput]
Full outerjoin:( R S)
All tuplesof bothparticipatingrelationsare includedinthe resultingrelationandif there nomatching
tuplesforbothrelations,theirrespective unmatchedattributesare made NULL.
Courses HoDA B C D
100 Database 100 Alex
101 Mechanics --- ---
102 Electronics 102 Maya
--- --- 104 Mira
[Table:Full outerjoinoutput]
31. DBMS - Storage System
Databasesare storedinfile formats,whichcontainsrecords.Atphysical level,actual dataisstoredin
electromagneticformatonsome device capable of storingitfora longeramountof time.These storage
devicescanbe broadlycategorizedinthree types:
PrimaryStorage:The memorystorage,whichisdirectlyaccessible bythe CPU,comesunderthis
category.CPU's internal memory(registers),fastmemory(cache)andmainmemory(RAM) are directly
accessible toCPUas theyall are placedon the motherboardorCPU chipset.Thisstorage istypicallyvery
small,ultrafastand volatile.Thisstorage needscontinuouspowersupplyinordertomaintainitsstate,
i.e.incase of powerfailure all dataare lost.
SecondaryStorage:The needtostore data for longeramountof time and to retainitevenafterthe
powersupplyisinterruptedgave birthtosecondarydatastorage.All memorydevices,whichare not
part of CPU chipsetormotherboardcomesunderthiscategory.Broadly,magneticdisks,all optical disks
(DVD,CD etc.),flashdrivesandmagnetictapesare notdirectlyaccessiblebythe CPU.Hard diskdrives,
whichcontainthe operatingsystemandgenerallynotremovedfromthe computersare,considered
secondarystorage andall other are calledtertiarystorage.
TertiaryStorage:Thirdlevel inmemoryhierarchyiscalledtertiarystorage.Thisisused tostore huge
amountof data.Because thisstorage isexternal tothe computersystem, itisthe slowestinspeed.
These storage devicesare mostlyusedtobackupthe entire system.Opticaldiskandmagnetictapesare
widelyusedstorage devicesastertiarystorage.
MemoryHierarchy
A computersystemhaswell-definedhierarchyof memory.CPUhasinbuiltregisters,whichsavesdata
beingoperatedon.Computersystemhasmainmemory,whichisalsodirectlyaccessiblebyCPU.
Because the accesstime of mainmemoryandCPU speedvariesalot,to minimize the losscache
memoryisintroduced.Cache memorycontainsmostrecentlyuseddataanddata whichmay be referred
by CPU innear future.
The memorywithfastestaccessisthe costliestone andisthe veryreasonof hierarchyof memory
system.Largerstorage offersslowspeedbutcanstore huge amountof data comparedtoCPU registers
or Cache memoryandthese are lessexpensive.
MagneticDisks
32. Hard diskdrivesare the most commonsecondarystorage devicesinpresentdaycomputersystems.
These are calledmagneticdisksbecause itusesthe conceptof magnetizationtostore information.Hard
disksconsistof metal diskscoatedwithmagnetizablematerial.These disksare placedverticallya
spindle.A read/write headmovesinbetweenthe disksandisusedtomagnetize orde-magnetizethe
spotunderit. Magnetizedspotcanbe recognizedas0 (zero) or1 (one).
Hard disksare formattedina well-definedordertostoreddata efficiently.A harddiskplate hasmany
concentriccirclesonit,calledtracks.Every track isfurtherdividedintosectors.A sectorona hard disk
typicallystores512 bytesof data.
RAID
Exponential growthintechnologyevolvedthe conceptof largersecondarystorage medium.Tomitigate
the requirementRAIDisintroduced.RAIDstandsforRedundantArrayof IndependentDisks,whichisa
technologytoconnectmultiplesecondarystorage devicesandmake use of themas a single storage
media.
RAID consistsanarray of diskinwhichmultiple disksare connectedtogethertoachieve differentgoals.
RAID levelsdefine the use of diskarrays.
RAID 0: In thislevel astripedarrayof disksisimplemented.The dataisbrokendownintoblocksandall
blocksare distributedamongall disks.Eachdiskreceivesablockof data to write/readinparallel.This
enhancesthe speedandperformance of storage device.There isnoparityandbackupin Level 0.
RAID 1: Thislevel uses
mirroringtechniques.Whendataissentto RAIDcontrolleritsendsacopy of data to all disksinarray.
RAID level 1isalsocalledmirroringandprovides100% redundancyincase of failure.
RAID 2: Thislevel recordsthe Error
CorrectionCode usingHammingdistance foritsdatastripedondifferentdisks.Like level 0,eachdata bit
ina wordis recordedona separate diskandECC codesof the data wordsare storedon differentset
disks.Because of itscomplex structure andhighcost,RAID2 isnot commerciallyavailable.
RAID 3: Thislevel also
stripesthe dataonto multiple disksinarray.The paritybitgeneratedfordata wordisstoredon a
differentdisk.Thistechnique makesittoovercome single diskfailureandasingle diskfailure doesnot
impactthe throughput.
33. RAID 4: In
thislevel anentire blockof dataiswrittenontodata disksandthenthe parity isgeneratedandstored
on a differentdisk.The prime difference betweenlevel 3and4 is,level 3usesbyte level striping
whereaslevel4usesblocklevel striping.Bothlevel 3and4 requiresatleast3 diskstoimplementRAID.
RAID 5:
Thislevel alsowriteswhole datablocksontodifferentdisksbutthe paritygeneratedfordatablock
stripe isnot storedona differentdedicateddisk,butisdistributedamongall the datadisk
RAID 6: Thislevel isan
extensionof level5.Inthis level twoindependentparitiesare generatedandstoredindistributed
fashionamongdisks.Twoparitiesprovideadditionalfaulttolerance.Thislevel requiresatleast4 disk
drivestobe implemented.
DBMS - File
Structure
34. Relative dataandinformationisstoredcollectivelyinfileformats.A file issequenceof recordsstoredin
binaryformat.A diskdrive isformattedintoseveralblocks,whichare capable forstoring records.File
recordsare mappedontothose diskblocks.
File Organization
The methodof mappingfile recordstodiskblocksdefinesfileorganization,i.e.how the filerecordsare
organized.The followingare the typesof file organization
HeapFile
Organization:Whenafile iscreatedusingHeapFile Organizationmechanism, the OperatingSystems
allocatesmemoryareatothat file withoutanyfurtheraccountingdetails.File recordscanbe placed
anywhere inthatmemoryarea.It isthe responsibilityof softwaretomanage the records.HeapFile does
not supportanyordering,sequencingorindexingonitsown.
SequentialFile Organization:Everyfilerecordcontainsadata field(attribute) touniquelyidentifythat
record.In sequentialfileorganization mechanism,recordsare placedinthe file inthe some sequential
orderbasedon the unique keyfieldorsearchkey.Practically,itisnotpossible tostore all the records
sequentiallyinphysical form.
Hash File Organization:ThismechanismusesaHash functioncomputationonsome fieldof the records.
As we know,thatfile isa collectionof records,whichhastobe mappedon some blockof the diskspace
allocatedtoit.This mappingisdefinedthatthe hashcomputation.The outputof hashdeterminesthe
locationof diskblockwhere the recordsmayexist.
ClusteredFileOrganization:Clusteredfileorganizationisnotconsideredgoodforlarge databases.Inthis
mechanism,relatedrecordsfromone ormore relationsare keptina same diskblock,thatis,the
orderingof recordsisnot basedon primarykeyor searchkey.Thisorganizationhelpstoretrievedata
easilybasedonparticularjoincondition.Otherthanparticularjoincondition,onwhichdataisstored,all
queriesbecome more expensive.
File Operations
Operationsondatabase filescanbe classifiedintotwocategoriesbroadly.
35. Update Operations
Retrieval Operations
Update operationschange the datavaluesbyinsertion,deletionorupdate.Retrieval operationsonthe
otherhand donot alterthe data but retrieve themafteroptionalconditional filtering.Inbothtypesof
operations,selectionplayssignificantrole.Otherthancreationanddeletionof afile,there couldbe
several operations,whichcanbe done onfiles.
Open:A file can be openedinone of twomodes,readmode or write mode.Inreadmode,operating
systemdoesnotallowanyone toalterdata itis solelyforreadingpurpose.Filesopenedinreadmode
can be sharedamongseveral entities.The othermode iswrite mode,in which,datamodificationis
allowed.Filesopenedinwrite modecanbe readalso butcannot be shared.
Locate: Everyfile hasa file pointer,whichtellsthe currentpositionwhere the dataistobe reador
written.Thispointercanbe adjustedaccordingly.Usingfind(seek) operationitcanbe movedforward
or backward.
Read:By default,whenfilesare openedinreadmode the file pointerpointstothe beginningof file.
There are optionswhere the usercantell the operatingsystemtowhere the file pointertobe locatedat
the time of file opening.The verynextdatatothe file pointerisread.
Write:User can selecttoopenfilesinwrite mode,whichenablesthemtoeditthe contentof file.Itcan
be deletion,insertionormodification.The file pointercanbe locatedatthe time of openingorcan be
dynamicallychangedif the operatingsystemalloweddoingso.
Close:Thisalsoismost importantoperationfromoperatingsystempointof view.Whenarequestto
close a file isgenerated,the operatingsystemremovesall the locks(if insharedmode) andsavesthe
contentof data (if altered) tothe secondarystorage mediaandrelease all the buffersandfile handlers
associatedwiththe file.
The organizationof data contentinside the fileplaysa majorrole here.Seekingorlocatingthe file
pointertothe desiredrecordinside file behavesdifferentlyif the filehasrecordsarrangedsequentially
or clustered,andsoon.
DBMS - Indexing
We knowthatinformationinthe DBMS filesisstoredinform of records.Everyrecord isequippedwith
some keyfield,whichhelpsittobe recognizeduniquely.
Indexingisadata structure technique toefficientlyretrieve recordsfromdatabase filesbasedonsome
attributesonwhichthe indexinghasbeendone.Indexingindatabase systemsissimilartothe one we
see inbooks.
Indexingisdefinedbasedonitsindexingattributes.Indexingcanbe one of the followingtypes:
36. PrimaryIndex:If index isbuiltonordering'key-field'of file itiscalledPrimaryIndex.Generallyitisthe
primarykeyof the relation.
SecondaryIndex:If index isbuiltonnon-orderingfieldof fileitiscalledSecondaryIndex.
ClusteringIndex:If index isbuiltonorderingnon-keyfieldof file itiscalledClusteringIndex.
Orderingfieldisthe fieldonwhichthe recordsof file are ordered.Itcanbe differentfromprimaryor
candidate keyof a file.
OrderedIndexingisof twotypes:
Dense Index
Sparse Index
Dense Index
In dense index,there isanindex recordforevery searchkeyvalue inthe database.Thismakessearching
fasterbut requiresmore space tostore index recordsitself.Index recordcontainssearchkeyvalue anda
pointertothe actual recordon the disk.
Sparse Index
In sparse index,index recordsare notcreatedforeverysearchkey.Anindex recordhere containssearch
keyand actual pointertothe data onthe disk.Tosearch a record we firstproceedbyindex recordand
reach at the actual locationof the data. If the data we are lookingforisnot where we directlyreachby
followingindex,the systemstartssequential searchuntil the desireddataisfound.
MultilevelIndex
Index recordsare comprisedof search-keyvalueanddatapointers.Thisindex itselfisstoredonthe disk
alongwiththe actual database files.Asthe size of database growssodoesthe size of indices.There isan
immense needtokeepthe index recordsinthe mainmemorysothatthe search can speedup.If single
37. level index isusedthenalarge size index cannotbe keptinmemoryaswhole andthisleadstomultiple
Multi-level Index helpsbreakingdownthe index intoseveral smallerindicesinordertomake the outer
mostlevel sosmall thatit can be savedinsingle diskblockwhichcaneasilybe accommodatedanywhere
inthe mainmemory.
B+ Tree
B tree is multi-level indexformat,whichisbalancedbinarysearchtrees.Asmentionedearliersingle
level index recordsbecomeslarge asthe database size grows,whichalsodegradesperformance.
All leaf nodesof B+ tree denote actual datapointers.B+tree ensuresthatall leaf nodesremainatthe
same height,thusbalanced.Additionally,all leafnodesare linkedusinglinklist,whichmakesB+tree to
supportrandomaccess as well assequentialaccess.
Structure of B+ tree
Everyleaf node isat equal distance fromthe rootnode.A B+ tree is of ordern where n isfixedforevery
B+ tree.
38. Internal nodes:
Internal (non-leaf) nodescontainsatleast ⌈n/2⌉ pointers,exceptthe rootnode.
At most,internal nodescontainn pointers.
Leaf nodes:
Leaf nodescontainat least ⌈n/2⌉ record pointersand ⌈n/2⌉ keyvalues
At most,leaf nodescontainnrecordpointersandn keyvalues
Everyleaf node containsone blockpointerPtopointto nextleaf node andformsa linkedlist.
B+ tree insertion
B+ tree are filledfrombottom.Andeachnode isinsertedatleaf node.
If leaf node overflows:
Splitnode intotwoparts
Partitionati = ⌊(m+1)/2⌋
Firsti entriesare storedinone node
Restof the entries(i+1onwards) are moved toa new node
ithkeyis duplicatedinthe parentof the leaf
If non-leaf node overflows:
Splitnode intotwoparts
Partitionthe node ati = ⌈(m+1)/2⌉
Entriesuptoi are keptin one node
Restof the entriesare movedtoa newnode
B+ tree deletion
B+ tree entriesare deletedleaf nodes.
The target entryissearchedand deleted.
If it is ininternal node,deleteandreplace withthe entryfromthe leftposition.
Afterdeletionunderflowistested
39. If underflowoccurs
Distribute entriesfromnodeslefttoit.
If distributionfromleftisnotpossible
Distribute fromnodesrighttoit
If distributionfromleftandrightisnotpossible
Merge the node withleftandrightto it.
DBMS - Hashing
For a huge database structure it isnot sometime feasibletosearchindex throughall itslevelandthen
reach the destinationdatablocktoretrieve the desireddata.Hashingisaneffectivetechnique to
calculate directlocationof datarecord on the diskwithoutusingindexstructure.
It usesa function,calledhashfunctionandgeneratesaddresswhencalledwithsearchkeyas
parameters.Hashfunctioncomputesthe locationof desireddataonthe disk.
Hash Organization
Bucket:Hash file storesdatainbucketformat.Bucketisconsideredaunitof storage.Buckettypically
storesone complete diskblock,whichinturncanstore one or more records.
Hash Function:A hash functionh,isa mappingfunctionthatmapsall set of search-keysKtothe address
where actual recordsare placed.Itis a functionfromsearchkeystobucketaddresses.
StaticHashing
In statichashing,whenasearch-keyvalue isprovidedthe hashfunctionalwayscomputesthe same
address.Forexample,if mod-4hashfunctionisusedthenitshall generateonly5values.The output
addressshall alwaysbe same forthat function.The numbersof bucketsprovidedremainsame atall
40. times.
Operation:
Insertion:Whenarecordis requiredtobe enteredusingstatichash,the hashfunctionh,computesthe
bucketaddressforsearch keyK, where the recordwill be stored.
Bucketaddress= h(K)
Search:Whena recordneedstobe retrievedthe same hashfunctioncanbe usedtoretrieve the address
of bucketwhere the dataisstored.
Delete:Thisissimplysearchfollowedbydeletionoperation.
BucketOverflow:
The conditionof bucket-overflowisknownascollision.Thisisafatal state for anystatic hashfunction.In
thiscase overflow chainingcanbe used.
41. OverflowChaining:Whenbucketsare full,anew bucketisallocatedforthe same hashresultandis
linkedafterthe previousone.ThismechanismiscalledClosedHashing.
LinearProbing:Whenhashfunctiongeneratesanaddressatwhichdata isalreadystored,the nextfree
bucketisallocatedtoit. ThismechanismiscalledOpen Hashing.
For a hash functiontoworkefficientlyand
effectivelythe followingmustmatch:
Distributionof recordsshouldbe uniform
Distributionshouldbe randominsteadof anyordering
DynamicHashing
Problemwithstatichashingisthatit doesnotexpandorshrinkdynamicallyasthe size of database
growsor shrinks.Dynamichashingprovidesamechanisminwhichdatabucketsare addedandremoved
dynamicallyandon-demand.Dynamichashingisalsoknownasextendedhashing.
Hash function,indynamichashing,ismade toproduce large numberof valuesandonlyafew are used
initially.
42. Organization
The prefix of entire hashvalue istakenashashindex.Onlyaportionof hash value isusedforcomputing
bucketaddresses.Everyhashindex hasadepth value,whichtellsithow manybitsare usedfor
computinghashfunction.These bitsare capable toaddress2n buckets.Whenall these bitsare
consumed,thatis,all bucketsare full,thenthe depthvalue isincreasedlinearlyandtwice the buckets
are allocated.
Operation
Querying:Lookat the depthvalue of hashindex anduse those bitstocompute the bucketaddress.
Update:Performa queryas above andupdate data.
Deletion:Performaquerytolocate desireddataand delete data.
Insertion:compute the addressof bucket
If the bucketisalreadyfull
Addmore buckets
Addadditional bittohashvalue
Re-compute the hashfunction
43. Else
Adddata to the bucket
If all bucketsare full,performthe remediesof statichashing.
Hashingisnot favorable whenthe dataisorganizedinsome orderingandqueriesrequire range of data.
Whendata is discrete andrandom,hashperformsthe best.
Hashingalgorithmandimplementationhave highcomplexitythanindexing.Allhashoperationsare
done inconstanttime.
DBMS - Transaction
A transactioncan be definedasa groupof tasks.A single taskisthe minimumprocessingunitof work,
whichcannotbe dividedfurther.
An example of transactioncanbe bankaccounts of two users,sayA & B. Whena bank employee
transfersamountof Rs. 500 from A's accountto B's account,a numberof tasksare executedbehindthe
screen.Thisverysimple andsmall transactionincludesseveral steps:decreaseA'sbankaccountfrom
500
Open_Account(A)
Old_Balance = A.balance
New_Balance =Old_Balance - 500
A.balance =New_Balance
Close_Account(A)
In simple words,the transactioninvolvesmanytasks,suchasopeningthe accountof A, readingthe old
balance,decreasingthe 500 fromit, savingnew balance toaccount of A and finallyclosingit.Toadd
amount500 inB's account same sort of tasksneedtobe done:
Open_Account(B)
Old_Balance = B.balance
New_Balance =Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
A simple transactionof movinganamountof 500 from A to B involvesmanylow leveltasks.
ACIDProperties
A transactionmaycontainseveral lowleveltasksandfurtheratransactionis verysmall unitof any
program.A transactionina database systemmustmaintainsome propertiesinordertoensure the
accuracy of itscompletenessanddataintegrity.These propertiesare refertoas ACIDpropertiesandare
mentionedbelow:
Atomicity:Thoughatransactioninvolvesseveral lowlevel operationsbutthispropertystatesthata
transactionmustbe treatedasan atomic unit,thatis,eitherall of itsoperationsare executedornone.
There mustbe nostate indatabase where the transactionisleftpartiallycompleted.Statesshouldbe
44. definedeitherbefore the executionof the transactionorafterthe execution/abortion/failure of the
transaction.
Consistency:Thispropertystatesthatafterthe transactionisfinished,itsdatabase mustremainina
consistentstate.There mustnotbe any possibilitythatsome dataisincorrectlyaffectedbythe
executionof transaction.If the database wasina consistentstate before the executionof the
transaction,itmustremaininconsistentstate afterthe executionof the transaction.
Durability:Thispropertystatesthatinanycase all updatesmade onthe database will persistevenif the
systemfailsandrestarts.If a transactionwritesorupdatessome data indatabase and commitsthat
data will alwaysbe there inthe database.If the transactioncommitsbutdata isnot writtenonthe disk
and the systemfails,thatdatawill be updatedonce the systemcomesup.
Isolation:Ina database systemwhere more thanone transactionare beingexecutedsimultaneouslyand
inparallel,the propertyof isolationstatesthatall the transactionswill be carriedoutandexecutedasif
it isthe onlytransactioninthe system.Notransactionwill affectthe existence of anyothertransaction.
Serializability
Whenmore than one transactionisexecutedbythe operatingsysteminamultiprogramming
environment,there are possibilitiesthatinstructionsof one transactionsare interleavedwithsome
othertransaction.
Schedule:A chronological executionsequence of transactioniscalledschedule.A schedule canhave
manytransactionsinit, eachcomprisingof numberof instructions/tasks.
Serial Schedule:A schedule inwhichtransactionsare alignedinsuchaway that one transactionis
executedfirst.Whenthe firsttransactioncompletesitscycle thennexttransactionisexecuted.
Transactionsare orderedone afterother.Thistype of schedule iscalledserial scheduleastransactions
are executedinaserial manner.
In a multi-transactionenvironment,serial schedulesare consideredasbenchmark.The execution
sequence of instructioninatransactioncannotbe changedbut twotransactionscan have their
instructionexecutedinrandomfashion.Thisexecutiondoesnoharmif two transactionsare mutually
independentandworkingondifferentsegmentof databutin case these twotransactionsare working
on same data, resultsmayvary.This ever-varyingresultmaycause the database inan inconsistentstate.
To resolve the problem, we allowparallel executionof transactionscheduleif transactionsinitare
eitherserializable orhave some equivalence relationbetweenoramongtransactions.
Equivalence schedules:Schedulescanequivalence of the followingtypes:
ResultEquivalence:
If two schedulesproduce same resultsafterexecution,are saidtobe resultequivalent.Theymayyield
same resultforsome value andmay yielddifferentresultsforanothervalues.That'swhythis
equivalence isnotgenerallyconsideredsignificant.
ViewEquivalence:
45. Two schedulesare view equivalence if transactionsinbothschedulesperformsimilaractionsinsimilar
manner.
For example:
If T readsinitial datainS1 thenT alsoreadsinitial datainS2
If T readsvalue writtenbyJin S1 thenT alsoreadsvalue writtenbyJin S2
If T performsfinal write ondatavalue inS1 thenT also performsfinal write ondatavalue inS2
ConflictEquivalence:
Two operationsare saidtobe conflictingif theyhave the followingproperties:
Both belongto separate transactions
Both accessesthe same data item
At leastone of themis"write"operation
Two scheduleshave more thanone transactionswithconflictingoperationsare saidtobe conflict
equivalentif andonlyif:
Both schedulescontainsame setof Transactions
The order of conflictingpairsof operationismaintainedinbothschedules
Viewequivalentschedulesare viewserializable andconflictequivalentschedulesare conflict
serializable.All conflictserializableschedulesare view serializable too.
Statesof Transactions:
A transactionina database can be inone of the followingstate:
46. Active:Inthisstate the transactionis beingexecuted.Thisisthe initial state of everytransaction.
PartiallyCommitted:Whenatransactionexecutesitsfinal operation,itissaidtobe in thisstate.After
executionof all operations,the database systemperformssome checkse.g.the consistencystate of
database afterapplyingoutputof transactionontothe database.
Failed:If anychecksmade by database recoverysystemfails,the transactionissaidtobe in failedstate,
fromwhere itcan no longerproceedfurther.
Aborted:If anyof checksfailsandtransactionreachedinFailedstate,the recoverymanagerrollsback
all itswrite operationonthe database to make database inthe state where itwas priorto start of
executionof transaction.Transactionsinthisstate are calledaborted.Database recoverymodulecan
selectone of the twooperationsaftera transactionaborts:
Re-startthe transaction
Kill the transaction
Committed:If transactionexecutesall itsoperationssuccessfullyitissaidtobe committed.All itseffects
are nowpermanentlymade ondatabase system.
DBMS - ConcurrencyControl
In a multiprogrammingenvironmentwheremore thanone transactionscanbe concurrentlyexecuted,
there existsaneedof protocolstocontrol the concurrencyof transactionto ensure atomicityand
isolationpropertiesof transactions.
Concurrencycontrol protocols,whichensure serializabilityof transactions,are mostdesirable.
Concurrencycontrol protocolscan be broadlydividedintotwocategories:
Lock basedprotocols
Time stampbasedprotocols
Lock basedprotocols
Database systems,whichare equippedwithlock-basedprotocols,use mechanismbywhichany
transactioncannotread or write data until itacquiresappropriate lockonitfirst.Locksare of twokinds:
BinaryLocks: a lock ondata itemcan be intwostates;it iseitherlockedorunlocked.
Shared/exclusive:thistype of lockingmechanismdifferentiateslockbasedontheiruses.If alockis
acquiredona data itemtoperforma write operation,itisexclusivelock.Because allowingmore than
one transactionsto write onsame data itemwouldleadthe database intoaninconsistentstate.Read
locksare sharedbecause nodata value isbeingchanged.
There are fourtypeslockprotocolsavailable:
Simplistic
47. Simplisticlockbasedprotocolsallow transactiontoobtainlockoneveryobjectbefore'write'operation
isperformed.Assoonas 'write'hasbeendone,transactionsmayunlockthe dataitem.
Pre-claiming
In thisprotocol,a transactionsevaluationsitsoperationsandcreatesalistof data itemsonwhichit
needslocks.Before startingthe execution,transactionrequeststhe systemforall locksitneeds
beforehand.If all the locksare granted,the transactionexecutesandreleasesall the lockswhenall its
operationsare over.Else if all the locksare not granted,the transactionrollsbackand waitsuntil all
locksare granted.
Two Phase Locking- 2PL
Thislockingprotocol isdividestransactionexecutionphaseintothree parts.Inthe firstpart,when
transactionstarts executing,transactionseeksgrantforlocksitneedsas itexecutes.Secondpart is
where the transactionacquiresall locksandnoother lockisrequired.Transactionkeepsexecutingits
operation.Assoonas the transactionreleasesitsfirstlock,the thirdphase starts.Inthisphase a
transactioncannotdemandfor anylock butonlyreleasesthe acquiredlocks.
Two phase lockinghastwophases,one isgrowing;where all locksare beingacquiredbytransactionand
secondone isshrinking,where locksheldbythe transactionare beingreleased.
To claiman exclusive (write) lock,a transactionmustfirstacquire ashared(read) lockand thenupgrade
it to exclusive lock.
Strict TwoPhase Locking
The firstphase of Strict-2PLissame as 2PL. Afteracquiringall locksinthe firstphase,transaction
continuestoexecute normally.Butincontrastto 2PL, Strict-2PLdoesnot release lockassoonas itis no
48. more required,butitholdsall locksuntil commitstate arrives.Strict-2PLreleasesall locksatonce at
commitpoint.
Time stampbasedprotocols
The most commonlyused concurrencyprotocol istime-stampbasedprotocol.Thisprotocol useseither
systemtime orlogical countertobe usedas a time-stamp.
Lock basedprotocolsmanage the orderbetweenconflictingpairsamongtransactionatthe time of
executionwhereastime-stampbasedprotocolsstartworkingassoonas transactionis created.
Everytransactionhas a time-stampassociatedwithitandthe orderingisdeterminedbythe age of the
transaction.A transactioncreatedat 0002 clock time wouldbe olderthanall othertransaction,which
come afterit. For example,anytransaction'y'enteringthe systemat0004 istwo secondsyoungerand
prioritymaybe givento the olderone.
In addition,everydataitemisgiventhe latestreadandwrite-timestamp.Thisletsthe systemknow,
whenwaslastread and write operationmade onthe dataitem.
Time-stamporderingprotocol
The timestamp-orderingprotocol ensuresserializabilityamongtransactionintheirconflictingreadand
write operations.Thisisthe responsibilityof the protocol systemthatthe conflictingpairof tasksshould
be executedaccordingtothe timestampvaluesof the transactions.
Time-stampof TransactionTi isdenotedasTS(Ti).
Readtime-stampof data-itemXisdenotedbyR-timestamp(X).
Write time-stampof data-itemXisdenotedbyW-timestamp(X).
Timestamporderingprotocol worksasfollows:
If a transactionTi issuesread(X) operation:
If TS(Ti) < W-timestamp(X)
Operationrejected.
If TS(Ti) >= W-timestamp(X)
Operationexecuted.
49. All data-itemTimestampsupdated.
If a transactionTi issueswrite(X)operation:
If TS(Ti) < R-timestamp(X)
Operationrejected.
If TS(Ti) < W-timestamp(X)
OperationrejectedandTi rolledback.
Otherwise,operationexecuted.
Thomas' Write rule:
Thisrule statesthat incase of:
If TS(Ti) < W-timestamp(X)
OperationrejectedandTi rolledback.Timestamporderingrulescanbe modifiedtomake the schedule
viewserializable.Insteadof makingTi rolledback,the 'write'operationitself isignored.
DBMS - Deadlock
In a multi-processsystem, deadlockisasituation,whicharisesinsharedresource environmentwhere a
processindefinitelywaitsforaresource,whichisheldbysome otherprocess,whichinturnwaitingfora
resource heldbysome other process.
For example,assumeasetof transactions{T0, T1, T2, ...,Tn}.T0 needsa resource Xto complete itstask.
Resource Xis heldbyT1 andT1 is waitingfora resource Y, whichisheldbyT2. T2 is waitingforresource
Z, whichisheldbyT0. Thus,all processeswaitforeachotherto release resources.Inthissituation,none
of processescanfinishtheirtask.Thissituationisknownas'deadlock'.
Deadlockisnota goodphenomenonforahealthysystem.Tokeepsystemdeadlockfreefew methods
can be used.Incase the systemisstuckbecause of deadlock,eitherthe transactionsinvolvedin
deadlockare rolledbackandrestarted.
DeadlockPrevention
To preventanydeadlocksituationinthe system,the DBMSaggressivelyinspectsall the operations
whichtransactionsare about toexecute.DBMSinspectsoperationsandanalyze if theycancreate a
deadlocksituation.If itfindsthatadeadlocksituationmightoccurthenthattransactionisneverallowed
to be executed.
There are deadlockprevention schemes,whichusestime-stamporderingmechanismof transactionsin
orderto pre-decide adeadlocksituation.
Wait-Die Scheme:
50. In thisscheme,if atransactionrequesttolocka resource (dataitem),whichisalreadyheldwith
conflictinglockbysome othertransaction,one of the twopossibilitiesmayoccur:
If TS(Ti) < TS(Tj),thatisTi, whichisrequestingaconflictinglock,isolderthanTj,Ti isallowedtowait
until the data-itemisavailable.
If TS(Ti) > TS(tj),thatisTi is youngerthanTj, Ti dies.Ti is restartedlaterwithrandomdelaybutwith
same timestamp.
Thisscheme allowsthe oldertransactiontowaitbutkillsthe youngerone.
Wound-WaitScheme:
In thisscheme,if atransactionrequesttolocka resource (dataitem),whichisalreadyheldwith
conflictinglockbysome othertransaction,one of the twopossibilitiesmayoccur:
If TS(Ti) < TS(Tj),thatisTi, whichisrequestingaconflictinglock,isolderthanTj,Ti forcesTj to be rolled
back, thatis Ti woundsTj.Tj isrestartedlaterwithrandomdelaybutwithsame timestamp.
If TS(Ti) > TS(Tj),thatisTi is youngerthanTj, Ti is forcedto waituntil the resource isavailable.
Thisscheme,allowsthe youngertransactiontowaitbutwhenanoldertransactionrequestan itemheld
by youngerone,the oldertransactionforcesthe youngerone toabortand release the item.
In bothcases,transaction,whichenterslate inthe system, isaborted.
DeadlockAvoidance
Abortinga transactionisnot alwaysapractical approach.Insteaddeadlockavoidancemechanismscan
be usedto detectanydeadlocksituationinadvance.Methodslike"wait-forgraph"are available butfor
the systemwhere transactionsare lightinweightandhave holdonfewerinstancesof resource.Ina
bulkysystemdeadlockpreventiontechniquesmayworkwell.
Wait-forGraph
Thisis a simple methodavailabletotrackif anydeadlocksituationmayarise.Foreachtransaction
enteringinthe system,anode iscreated.WhentransactionTi requestsfora lockon item, sayX, which
isheldbysome othertransactionTj, a directededge iscreatedfromTi to Tj. If Tj releasesitemX,the
edge betweenthemisdroppedandTi locksthe data item.
The systemmaintainsthiswait-forgraphforeverytransactionwaitingfor some dataitemsheldby
others.Systemkeepscheckingif there'sanycycle inthe graph.
51. DBMS - Data Backup
Failure withlossof Non-Volatilestorage
What wouldhappenif the non-volatile storage like RAMabruptlycrashes?All transaction,whichare
beingexecutedare keptinmainmemory.All active logs,diskbuffersandrelateddataisstoredinnon-
volatile storage.
Whenstorage like RAMfails,ittakesaway all the logsand active copyof database.Itmakesrecovery
almostimpossible aseverythingtohelprecoverisalsolost.Followingtechniquesmaybe adoptedin
case of lossof non-volatilestorage.
A mechanismlike checkpointcanbe adoptedwhichmakesthe entire contentof database be saved
periodically.
State of active database innon-volatilememorycanbe dumpedontostable storage periodically,which
may alsocontainlogsand active transactionsandbufferblocks.
<dump> can be markedon logfile wheneverthe database contentsare dumpedfromnon-volatile
memoryto a stable one.
Recovery:
Whenthe systemrecoversfromfailure,itcanrestore the latestdump.
It can maintainredo-listandundo-listasincheckpoints.
It can recoverthe systembyconsultingundo-redoliststorestore the state of all transactionupto last
checkpoint.
Database backup& recoveryfromcatastrophicfailure
So far we have not discoveredanyotherplanetinoursolarsystem, whichmayhave life onit,andour
ownearth isnot that safe.Incase of catastrophicfailure like alienattack,the database administrator
may still be forcedtorecoverthe database.
52. Remote backup,describednext,isone of the solutionstosave life.Alternatively,wholedatabase
backupscan be takenonmagnetictapesand storedat a saferplace.Thisbackupcan laterbe restored
on a freshlyinstalleddatabaseandbringitto the state at leastatthe pointof backup.
Grown updatabasesare toolarge to be frequentlybacked-up.Instead,we are aware of techniques
where we can restore adatabase by justlookingatlogs.So backupof logsat frequentrate ismore
feasible thanthe entire database.Database canbe backed-uponce aweekandlogs,beingverysmall
can be backed-upeverydayoras frequentaseveryhour.
Remote Backup
Remote backupprovidesasense of securityandsafety incase the primarylocationwhere the database
islocatedgetsdestroyed.Remote backupcanbe offlineorreal-timeandonline.Incase itis offline itis
maintainedmanually.
Online
backupsystemsare more real-time andlifesaversfordatabase administratorsandinvestors.Anonline
backupsystemisa mechanismwhere everybitof real-time dataisbacked-upsimultaneouslyattwo
distantplace.One of themisdirectlyconnectedtosystemandotherone iskeptat remote place as
backup.
As soonas the primarydatabase storage fails,the backupsystemsense the failure andswitchthe user
systemtothe remote storage.Sometimesthisissoinstantthe usersevencan'trealize afailure.
DBMS - Data Recovery
Crash Recovery
Thoughwe are livinginhighlytechnologicallyadvancederawhere hundredsof satellite monitorthe
earthand at everysecondbillionsof people are connectedthroughinformationtechnology,failure is
expectedbutnoteverytime acceptable.
DBMS ishighlycomplex systemwithhundredsof transactionsbeingexecutedeverysecond.Availability
of DBMS dependsonitscomplex architecture andunderlyinghardware orsystemsoftware.If itfailsor
crashesamidtransactionsbeingexecuted,itisexpectedthatthe systemwouldfollowsome sortof
algorithmortechniquestorecoverfromcrashesor failures.
53. Failure Classification
To see where the problemhasoccurredwe generalizethe failure intovariouscategories,asfollows:
Transactionfailure
Whena transactionis failedtoexecute oritreachesapointafterwhichit cannotbe completed
successfullyithastoabort.This iscalledtransactionfailure.Whereonlyfew transactionorprocessare
hurt.
Reasonfortransactionfailure couldbe:
Logical errors:where a transactioncannotcomplete because of ithassome code error or any internal
error condition
Systemerrors:where the database systemitself terminatesanactive transactionbecause DBMSisnot
able to execute itorithas to stop because of some systemcondition.Forexample,incase of deadlock
or resource unavailabilitysystemsabortsanactive transaction.
Systemcrash
There are problems,whichare external tothe system, whichmaycause the systemtostopabruptlyand
cause the systemtocrash. For example interruptioninpowersupply,failureof underlyinghardware or
software failure.
Examplesmayinclude operatingsystemerrors.
Diskfailure:
In earlydaysof technologyevolution,itwasa commonproblemwhere harddiskdrivesorstorage drives
usedto fail frequently.
Diskfailuresincludeformationof badsectors,unreachabilitytothe disk,diskheadcrashor anyother
failure,whichdestroysall orpart of diskstorage
Storage Structure
We have alreadydescribedstorage systemhere.Inbrief,the storage structure canbe dividedinvarious
categories:
Volatile storage:Asname suggests,thisstorage doesnotsurvive systemcrashesandmostlyplacedvery
closedtoCPU by embeddingthemontothe chipsetitself forexamples:mainmemory,cache memory.
Theyare fastbut can store a small amountof information.
Nonvolatile storage:Thesememoriesare made tosurvive systemcrashes.Theyare huge indatastorage
capacitybut slowerinaccessibility.Examplesmayinclude,harddisks,magnetictapes,flashmemory,
non-volatile (batterybackedup) RAM.
RecoveryandAtomicity
Whena systemcrashes,itmany have several transactionsbeingexecutedandvariousfilesopenedfor
themto modifyingdataitems.Aswe know thattransactionsare made of variousoperations,whichare
54. atomicin nature.Butaccording to ACIDpropertiesof DBMS,atomicityof transactionsas a whole must
be maintainedthatis,eitherall operationsare executedornone.
WhenDBMS recoversfroma crash it shouldmaintainthe following:
It shouldcheckthe statesof all transactions,whichwere beingexecuted.
A transactionmaybe inthe middle of some operation;DBMSmustensure the atomicityof transaction
inthiscase.
It shouldcheckwhetherthe transactioncanbe completednow orneedstobe rolledback.
No transactionswouldbe allowedtoleftDBMSininconsistentstate.
There are twotypesof techniques,whichcanhelpDBMS inrecoveringaswell asmaintainingthe
atomicityof transaction:
Maintainingthe logsof eachtransaction,and writingthemontosome stable storage beforeactually
modifyingthe database.
Maintainingshadowpaging,where are the changesare done ona volatile memoryandlaterthe actual
database isupdated.
Log-BasedRecovery
Log is a sequence of records,whichmaintainsthe recordsof actionsperformedbyatransaction.Itis
importantthatthe logsare writtenpriortoactual modificationandstoredona stable storage media,
whichisfailsafe.
Log basedrecoveryworksasfollows:
The log file iskeptonstable storage media
Whena transactionentersthe systemandstarts execution,itwritesalogaboutit
<Tn, Start>
Whenthe transactionmodifiesanitemX,itwrite logsasfollows:
<Tn, X, V1,V2>
It readsTn has changedthe value of X, fromV1 to V2.
Whentransactionfinishes,itlogs:
<Tn, commit>
Database can be modifiedusingtwoapproaches:
Deferreddatabase modification:All logsare writtenontothe stable storage anddatabase isupdated
whentransactioncommits.
55. Immediate database modification:Eachlogfollowsanactual database modification.Thatis,database is
modifiedimmediatelyaftereveryoperation.
Recoverywithconcurrenttransactions
Whenmore than one transactionsare beingexecuted inparallel,the logsare interleaved.Atthe time of
recoveryitwouldbecome hardforrecoverysystemtobacktrack all logs,andthenstart recovering.To
ease thissituationmostmodernDBMSuse the conceptof 'checkpoints'.
Checkpoint
Keepingandmaintaininglogsinreal time andinreal environmentmayfill outall the memoryspace
available inthe system.Attime passeslogfile maybe toobigto be handledatall.Checkpointisa
mechanismwhere all the previouslogsare removedfromthe systemand storedpermanentlyinstorage
disk.Checkpointdeclaresapointbefore whichthe DBMSwasin consistentstate andall the transactions
were committed.
Recovery
Whensystemwithconcurrenttransactioncrashesandrecovers,itdoesbehave inthe followingmanner:
The recoverysystemreadsthe logsbackwardsfromthe endto the lastCheckpoint.
It maintainstwolists,undo-listandredo-list.
If the recoverysystemseesalogwith<Tn,Start> and <Tn, Commit>or just<Tn, Commit>,itputs the
transactioninredo-list.
If the recoverysystemseesalogwith<Tn,Start> butno commitor abort log found,itputsthe
transactioninundo-list.
All transactionsinundo-listare thenundone andtheirlogsare removed.All transactioninredo-list,
theirpreviouslogsare removedandthenredone againandlogsaved.