Informal Design Guidelines for Relation Schemas.pptx

Informal Design Guidelines
for Relation Schemas
Dr. Gowthami V

Four informal guidelines that may be used as measures to determine the quality of
relation schema design:
• Making sure that the semantics of the attributes is clear in the schema
• Minimizing the redundant information in tuples
• Minimizing the NULL values in tuples
• Disallowing the possibility of generating spurious tuples

1. Imparting Clear Semantics to
Attributes in Relations
• Whenever we group attributes to form a relation schema, we assume
that attributes belonging to one relation have certain real-world
meaning and a proper interpretation associated with them. The
semantics of a relation refers to its meaning resulting from the
interpretation of attribute values in a tuple.

Guideline 1
• Design a relation schema so that it is easy to explain its meaning. Do
not combine attributes from multiple entity types and relationship
types into a single relation. Intuitively, if a relation schema
corresponds to one entity type or one relationship type, it is
straightforward to interpret and to explain its meaning. Otherwise, if
the relation corresponds to a mixture of multiple entities and
relationships, semantic ambiguities will result and the relation cannot
be easily explained.

• each tuple relates an employee to a project but also includes the
employee name (Ename), project name (Pname), and project location
(Plocation). Although there is nothing wrong logically with these two
relations, they violate Guideline 1 by mixing attributes from distinct
real-world entities: EMP_DEPT mixes attributes of employees and
departments, and EMP_PROJ mixes attributes of employees and
projects and the WORKS_ON relationship. Hence, they fare poorly
against the above measure of design quality.

2. Redundant Information in Tuples
and Update Anomalies
• Design the base relation schemas so that no insertion, deletion, or
modification anomalies are present in the relations. If any anomalies
are present, note them clearly and make sure that the programs that
update the database will operate correctly.

• In database design, redundant information refers to the unnecessary
repetition of data within a database. Redundancy can lead to several
problems, often referred to as anomalies:
• Update Anomalies: If redundant data exists in multiple places,
updating the data in one place but not in others can lead to
inconsistencies. For example, if a customer's address is stored in
several tables and the customer moves, failing to update all instances
of the address can result in having multiple, conflicting addresses for
the same customer.

• Insertion Anomalies: These occur when you cannot add data to the
database due to the absence of other data. For instance, if a new
student must be added to a database but there is no course record
yet, and if the database design requires a course record for each
student, then the student cannot be added without also creating a
placeholder for the course.

• Deletion Anomalies: These occur when deleting some data
inadvertently results in losing other valuable data. For example, if a
student record is deleted, and this record also stores unique
information about a course (and if this information is only stored in
the student record), then deleting the student record would result in
losing the course information as well.

Why Avoid Redundant Information?
• Consistency is maintained: Changes to data only need to be made in
one place, preventing conflicting data entries.
• Data Integrity is preserved: The database accurately reflects real-
world entities and their relationships without inconsistencies.
• Efficiency is improved: Reducing the size of the database by
eliminating duplicate entries saves storage space and makes data
management more straightforward.

• There are three main types of anomalies:
• Update Anomalies:
• Definition: Occur when data is duplicated in multiple places, and a
change to the data in one place does not automatically propagate to
all instances of that data.
• Example: Suppose a customer's phone number is stored in multiple
rows of a table because the customer has multiple orders. If the
customer changes their phone number and the update is made in
only one row, the other rows will have outdated information. This
inconsistency can cause confusion and errors.

• Insertion Anomalies:
• Definition: Occur when certain data cannot be added to the database
without the presence of other data.
• Example: Consider a table that stores both student and course
information. If the database design requires that each student must
be associated with a course, then it becomes impossible to add a new
student who hasn't yet enrolled in any course. This restriction is due
to the poor structure of the table, where unrelated data is combined
in a single table.

• Deletion Anomalies:
• Definition: Occur when the deletion of some data inadvertently
causes the loss of other data.
• Example: Suppose a database table stores both employee information
and department details. If the last employee in a department is
deleted from the table, the department information might also be
lost, even though the department still exists independently of its
employees. This happens because the department information is tied
directly to the employee record, and there’s no separate record for
departments.

3. NULL Values in Tuples
• In relational databases, a null value represents the absence of a value
or that the value is unknown. While null values can be useful in
certain situations, overusing them or allowing them to be scattered
throughout the database can lead to various problems, such as
complications in query results, misunderstandings in data
interpretation, and difficulties in maintaining data integrity.

Guideline 3: Minimize the Use of Null Values
Clarity and Meaning:
• Null values can make it unclear what data is missing. For instance, if
an employee's "end_date" in a database is null, it could mean that the
employee is still employed, the end date is not yet determined, or
that it is simply unknown whether the employee has left.
• Minimize nulls to ensure that the meaning of each data entry is clear
and unambiguous.

• Data Integrity and Consistency:
• Having too many nulls in a database can lead to inconsistent data
states. For example, if a table recording sales transactions has several
null values in the "amount" column, it could lead to confusion about
whether the sale amounts were missed, not applicable, or if there
was a mistake in data entry.
• Consistent use of nulls ensures that the database integrity is
maintained, making it easier to understand and analyze the data.

• Complications in Queries:
• Null values can complicate SQL queries, particularly when performing
aggregations or comparisons. Functions like SUM() or AVG() might
ignore nulls, leading to results that do not accurately reflect the data's
true state.
• Queries must be carefully written to handle nulls appropriately, often
requiring additional conditions (IS NULL, IS NOT NULL), which can
complicate query logic and reduce performance.

• Potential for Anomalies:
• Null values can create anomalies, especially in operations like
updates, inserts, and deletes. For example, if a database enforces
certain constraints (like foreign keys), a null value might violate these
constraints, leading to data anomalies and integrity issues.

Strategies to Minimize Null Values
• Use Default Values: Instead of using nulls, consider using default
values that make sense within the context. For example, using
"Unknown" or "Not Applicable" as a default for textual fields can help
avoid nulls.
• Break Down Tables: Normalize the database to separate optional data
into different tables. For instance, optional fields like a second phone
number can be moved to a separate table that links back to the main
table.

• Use Proper Data Types and Constraints: Define columns with NOT
NULL constraints where possible to enforce that values must be
provided. This practice ensures that no nulls are entered accidentally.
• Evaluate the Necessity of Nulls: Sometimes nulls are necessary, such
as in cases where a value truly is unknown or inapplicable. However,
they should be used judiciously and only when there is a strong
justification.

4. Generation of Spurious Tuples
• In the context of relational database design, spurious tuples are
unintended and incorrect rows that can appear when joining tables
that are not properly normalized or when the joins are based on
incorrect or inadequate conditions. These tuples do not represent
real-world entities or relationships and can lead to inaccurate query
results and misleading data interpretations.

Guideline 4:
• Understanding Spurious Tuples:Spurious tuples are the result of a
poor database schema design, especially when the schema allows
incorrect join operations to produce meaningless or incorrect data.
• This often happens when tables are joined on attributes that do not
uniquely identify rows in each table, causing a Cartesian product-like
result rather than a meaningful combination of related data.

• Causes of Spurious Tuples:
• Incorrect Joins: Using non-key attributes (i.e., attributes that are not primary or
foreign keys) to join tables can lead to spurious tuples. For example, if two tables are
joined on a common attribute that does not uniquely identify records, extra rows
can appear in the result set.
• Redundant or Overlapping Data: When tables store overlapping or redundant data,
improper normalization can lead to situations where the join conditions accidentally
match unrelated rows, producing spurious tuples.
• Improper Decomposition: When a table is decomposed (split into two or more
tables) in such a way that it loses important information about how to correctly
reassemble (join) them, spurious tuples can be generated. This is often due to the
loss of functional dependencies that ensure the correct relationships between data.

• Impact of Spurious Tuples:
• Data Integrity Issues: Spurious tuples compromise data integrity, as
they do not accurately represent the underlying real-world data or
relationships. This can lead to incorrect conclusions, faulty reports,
and errors in data analysis.
• Misleading Information: Spurious tuples can produce misleading
results in queries, especially in aggregations or when attempting to
retrieve meaningful patterns from the data.

Strategies to Avoid Spurious Tuples
Normalize the Database:
• Proper normalization (up to at least the Third Normal Form or Boyce-
Codd Normal Form) helps ensure that each table contains only related
attributes and that each attribute depends on the primary key. This
process reduces redundancy and minimizes the chances of generating
spurious tuples during joins.

Use Proper Join Conditions:
• Always use primary keys and foreign keys as join conditions. These
keys uniquely identify records in their respective tables and ensure
that only related records are joined together, preventing spurious
tuples.
• Avoid using non-key attributes in join conditions unless there is a
specific, logical reason that these attributes are meant to join.

Maintain Referential Integrity:
• Ensure that foreign key constraints are properly defined and enforced.
Referential integrity guarantees that relationships between tables are
consistent and that every foreign key corresponds to a primary key in
another table.

Careful Decomposition:
• When decomposing tables, ensure that the decomposition does not
lead to loss of important information or functional dependencies. A
well-decomposed schema should allow tables to be joined without
generating spurious tuples, preserving the original data’s integrity and
relationships.

Check Join Operations:
• Regularly review and test join operations to ensure that they produce
the expected results without generating spurious tuples. Use sample
data and validate results to identify and correct any issues early.

Informal Design Guidelines for Relation Schemas.pptx

More Related Content

What's hot

Similar to Informal Design Guidelines for Relation Schemas.pptx

More from Gowthami476224

Recently uploaded

Informal Design Guidelines for Relation Schemas.pptx