2. Introduction to DBMS and ACID characteristics
Entity-Relation diagram
Normalization
CONTENT
3. INTRODUCTION
What is Data?
Data is a collection of a distinct small unit of information.
It can be used in a variety of forms like text, numbers, media, bytes, etc. and can be stored in pieces of paper or
electronic memory, etc.
In computing, Data is information that can be translated into a form for efficient movement and processing. Data
is interchangeable.
4. What is Database?
A database is an organized collection of data, so that it can be easily accessed and managed
You can organize data into tables, rows, columns, and index it to make it easier to find relevant information
The main purpose of the database is to operate a large amount of information by storing, retrieving, and
managing data
Modern databases are managed by the database management system (DBMS)
5. DISADVANTAGES OF FILE SYSTEM
1. Data redundancy
2. Data inconsistency
3. Difficulty in accessing data
4. Data isolation
5. Security problem
6. Atomicity problem
7. Complex access anomalies
8. Integrity problem
6. ACID PROPERTIES IN DBMS
DBMS is the management of data that should remain integrated when any changes are done in it.
It is because if the integrity of the data is affected, whole data will get disturbed and corrupted.
Therefore, to maintain the integrity of the data, there are four properties described in the database management
system, which are known as the ACID properties.
The ACID properties are meant for the transaction that goes through a different group of tasks, and there we
come to see the role of the ACID properties.
8. 1) ATOMICITY
The term atomicity defines that the data remains atomic.
It means if any operation is performed on the data, either it should be performed or executed completely or should
not be executed at all.
It further means that the operation should not break in between or execute partially.
In the case of executing operations on the transaction, the operation should be completely executed and not partially.
9. 2) CONSISTENCY
The word consistency means that the value should remain preserved always.
In DBMS, the integrity of the data should be maintained, which means if a change in the database is made, it should remain
preserved always.
In the case of transactions, the integrity of the data is very essential so that the database remains consistent before and
after the transaction.
The data should always be correct.
10. 3) ISOLATION
The term 'isolation' means separation.
In DBMS, Isolation is the property of a database where no data should affect the other one and may occur concurrently.
In short, the operation on one database should begin when the operation on the first database gets complete.
It means if two operations are being performed on two different databases, they may not affect the value of one another.
In the case of transactions, when two or more transactions occur simultaneously, the consistency should remain
maintained.
Any changes that occur in any particular transaction will not be seen by other transactions until the change is not
committed in the memory.
11. If two operations are concurrently running on
two different accounts, then the value of both
accounts should not get affected.
The value should remain persistent.
As you can see in the below diagram, account A
is makingT1 andT2 transactions to account B
and C, but both are executing independently
without affecting each other. It is known as
Isolation.
12. 4) DURABILITY
Durability ensures the permanency of something.
In DBMS, the term durability ensures that the data after the successful execution of the operation becomes permanent in
the database.
The durability of the data should be so perfect that even if the system fails or leads to a crash, the database still survives.
However, if gets lost, it becomes the responsibility of the recovery manager for ensuring the durability of the database.
For committing the values, the COMMIT command must be used every time we make changes.
13. ER DIAGRAM IN DBMS
ER model stands for an Entity-Relationship model.
This model is used to define the data elements and relationship for a specified system.
It develops a conceptual design for the database. It also develops a very simple and easy to design view of data.
In ER modeling, the database structure is portrayed as a diagram called an entity-relationship diagram.
14. ER DIAGRAM IN DBMS
A non-technical design method that works on conceptual
level based on the perception of real world
Consists of collection of basic objects called entities and
of relationship among objects and attributes which define
their properties
Free from ambiguities and provides a standard and logical
way of visualizing data
Thus, it is a diagrammatic representation which is easy to
understand by non-technical user
16. WHAT IS AN ENTITY IN ERDS?
An entity simply represents an object in our database.This could be an object for users, courses, products,
and so on.
Note that the name of every entity should be singular (user) and not plural (users).
Here is what an entity looks like:
The diagram above shows an entity called user.
This entity will have information about
the various users registered on a platform.
17. WHAT ARE ATTRIBUTES IN ERDS?
The information about an object is the attributes.
We can say the properties of an entity are the attributes.
The entity in the diagram above has three attributes – username,age, and email.
18. RELATIONSHIP BETWEEN ENTITIES IN ERDS
In most cases, databases are made up of more than one entity.
To understand the relationship between one entity and another, we use lines to connect them.
But these lines have notations (indicators) on them to specify the type of relationship that exists between two
entities.
We'll use crow's foot notation to specify our entity relationships.
19. SYMBOLS IN CROW’S FOOT NOTATION ANDTHEIR MEANING
Cardinality acts as a parameter for the relationship between entities.
For one entity, there is a minimum and maximum number that helps define its relationship with another entity.
Zero
o The symbol/diagram above denotes zero in crow's foot notation.
o We know this because the of the zero/circle indicator at the right side of the horizontal line.
20. One
o The diagram above shows a horizontal line with a short vertical lines crossing it.
o The vertical line acts as the indicator – it denotes one.
Many
o The diagram above denotes many.
o You can easily remember this symbol because it looks like a crow's foot.
21. Zero or Many
o The zero or many symbol/indicator in crow's foot notation is a combination of the zero and many indicators.
One or Many
o The one or many indicator is a combination of two indicators – one and many.
One and only one
o The one and only one indicator has two "one" indicators.
22. CROW'S FOOT NOTATION EXAMPLE #1
Let's assume we have two entities in our database:
A teacher and course entity.
23. Relationship of theTeacher Entity and the Area Course Entity
Each teacher can only teach one area course.
The notation here will be one and only one
The minimum number of courses a teacher can take up is one, and the maximum is also one.
24. Relationship of the Area Course Entity and theTeacher Entity
One area course can be taught by one or many teachers.
The minimum here will be one while the maximum will be many.
The notation to be used is one or many.
25. CROW'S FOOT NOTATION EXAMPLE #2
The notations don't always have to be different.
What matters is the logic behind the relationship between entities.
This is entirely up to the those creating or designing the database.
26. INTERPRETATION
The notation on the left has the zero or many notation.
This implies that a pizza can be ordered by none (optional) or many customers.
Similarly, the notation on the right side implies that a customer can order zero or many pizzas.
The cardinality here is the same for both entities.
Zero is the minimum while many is the maximum.
27. NOTATION OF ER DIAGRAM
Database can be represented using the notations. In ER
diagram, many notations are used to express the cardinality.
These notations are as follows
28. FUNCTIONAL DEPENDENCY
The functional dependency is a relationship that exists between two attributes in a tableT :
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.
For example: Assume we have an employee table with attributes: Emp_Id, Emp_Name,Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we know the
Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as:
Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
29. FUNCTIONAL DEPENDENCY VS FULLFUNCTIONAL DEPENDENCY
Let X and Y be sets of attributes in a table T
🞐 Y is functionally dependent on X in T iff for each set x
R.X there is precisely one corresponding set y R.Y
🞐 Y is fully functional dependent on X in T if Y is functional dependent on X
and Y is not functional dependent on any proper subset of X
30. EXAMPLE
Book table
Author attribute is:
functionally dependent on the pair
{ BookNo, Title}
fully functionally dependent on BookNo
32. PROBLEMS
Cannot insert new patrons in the system until they have borrowed books
■ Insertion anomaly
🞐 Must update all rows involving a given patron if he or she moves.
■ Update anomaly
🞐 Will lose information about patrons that have returned all the books they have
borrowed
■ Deletion anomaly
34. 1.TRIVIAL FUNCTIONAL DEPENDENCY
A → B has trivial functional dependency if B is a subset of A.
The following dependencies are also trivial like:A → A, B → B
Example:
Consider a table with two columns Employee_Id and Employee_Name
{Employee_id, Employee_Name} → Employee_Id
is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id,Employee_Name}
Also,
Employee_Id → Employee_Id and Employee_Name → Employee_Name
are trivial dependencies too.
35. 2. NON-TRIVIAL FUNCTIONAL DEPENDENCY
A → B has a non-trivial functional dependency if B is not a subset of A.
When A intersection B is NULL, then A → B is called as complete non-trivial.
Example:
ID → Name,
Name → DOB
36. KEYS
Key is used to uniquely identify any record or row of data from the table.
It is also used to establish and identify relationships between tables.
For example, ID is used as a key in the Student
table because it is unique for each student. In the
PERSON table, passport_number,
license_number, SSN are keys since they are
unique for each person.
37. 1. SUPER KEY
Super key is an attribute set that can uniquely identify a tuple.
A super key is same as candidate key.
38. 2. CANDIDATE KEY
A candidate key is an attribute or set of attributes that can uniquely
identify a tuple.
For example: In the EMPLOYEE table, id is best suited for the
primary key.
The rest of the attributes, like SSN, Passport_Number,
License_Number, etc., can also be considered a candidate key.
39. 3. PRIMARY KEY
It is the first key used to identify one and only one instance of an entity
uniquely.
An entity can contain multiple keys, as we saw in the PERSON table.
The key which is most suitable from those lists becomes a primary key.
In the EMPLOYEE table, ID can be the primary key since it is unique for
each employee.
In the EMPLOYEE table, we can even select License_Number and
Passport_Number as primary keys since they are also unique.
For each entity, the primary key selection is based on requirements and
developers.
40. NORMALIZATION
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations.
It is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
Normalization divides the larger table into smaller and links them using relationships.
The normal form is used to reduce redundancy from the database table.
41. WHY DOWE NEED NORMALIZATION?
The main reason for normalizing the relations is removing anomalies.
Failure to eliminate anomalies leads to data redundancy and can cause data integrity and other problems as the
database grows.
Normalization consists of a series of guidelines that helps to guide you in creating a good database structure.
42. Data modification anomalies can be categorized into three types:
Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a relationship due to
lack of data.
Deletion Anomaly:The delete anomaly refers to the situation where the deletion of data results in the
unintended loss of some other important data.
Update Anomaly:The update anomaly is when an update of a single data value requires multiple rows of data
to be updated.
43. DESIGNING DATABASES
⮚ Schema – the logical structure of the database e.g., the database consists of information about a set of customers
and accounts and the relationship between them
⮚ Physical schema: database design at the physical level
⮚ Logical schema: database design at the logical level
⮚ State (or Instance) – the actual content of the database at a particular point in time
47. Normal Form Description
1NF A relation is in 1NF if it contains
an atomic value.
2NF A relation will be in 2NF if it is in
1NF and all non-key attributes
are fully functional dependent on
the primary key.
3NF A relation will be in 3NF if it is in
2NF and no transition
dependency exists.
48. FIRST NORMAL FORM (1NF)
A relation will be 1NF if it contains an atomic value.
It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.
First normal form disallows the multi-valued attribute, composite attribute, and their combinations.
49. EXAMPLE
EMPLOYEE table:
Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 Rajat 7272826385,
9064738238
UP
20 Harish 8574783832 Bihar
12 Sameer 7390372389,
8589830302
Punjab
50. The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab
54. SECOND NORMAL FORM
A table is in 2NF iff
■ It is in 1NF and
■ no non-prime attribute is dependent on any proper subset of any candidate
key of the table
A non-prime attribute of a table is an attribute that is not a part of any
candidate key of the table
A candidate key is a minimal superkey
55. SECOND NORMAL FORM (2NF)
In the 2NF, relational must be in 1NF.
In the second normal form, all non-key attributes are fully functional dependent on the primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can
teach more than one subject.
TEACHER table
TEACHER
_ID
SUBJECT TEACHER
_AGE
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
56. In the given table, non-prime attributeTEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a
candidate key.That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_SUBJECT table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
60. THIRD NORMAL FORM (3NF)
🞐 A table is in 3NF iff
■ it is in 2NF and
■ all its attributes are determined only by its candidate keys and not by
any non-prime attributes
61. EXAMPLE
We have a table representing orders in an online store
Each row represents an item on a particular order
Primary key is
{Order, Product}
Columns
Order
Product
Quantity
UnitPrice
Customer
Address
62. FUNCTIONAL DEPENDENCIES
🞐 Each order is for a single customer
■ Order Customer
🞐 Each product has a single price
■ Product UnitPrice
🞐 Each customer has a single address
■ Customer Address
63. 2NF SOLUTION (II)
🞐 Decomposition
■ First table
■ Second table
■ Third table
Order Product Quantity
Order Customer Address
Product UnitPrice
64. 3NF
🞐 In second table
■ Customer Address
🞐 Split second table into
Order Customer Address
Order Customer
Customer Address