The document discusses database normalization. It defines normalization as decomposing relations to eliminate redundancy and anomalies. The goals of normalization are to eliminate redundancy, organize data efficiently, and reduce anomalies. It describes three common data anomalies - insertion, deletion, and modification anomalies. It also explains different normal forms including 1NF, 2NF, 3NF and BCNF and provides examples to illustrate how to normalize relations to these forms. The document emphasizes that normalization improves data quality by reducing redundancy and inconsistencies.
chapter 4-Functional Dependency and Normilization.pdfMisganawAbeje1
This chapter describe about the theory that has been developed with the goal of evaluating relational schemas for design quality , that is, to measure formally why one set of groupings of attributes into relation schemas is better than
another.
Normalization is a process of organizing the data in a database to avoid data redundancy, insertion anomaly, update anomaly & deletion anomaly.
--> 1NF
--> 2NF
--> 3NF
--> BCNF.
chapter 4-Functional Dependency and Normilization.pdfMisganawAbeje1
This chapter describe about the theory that has been developed with the goal of evaluating relational schemas for design quality , that is, to measure formally why one set of groupings of attributes into relation schemas is better than
another.
Normalization is a process of organizing the data in a database to avoid data redundancy, insertion anomaly, update anomaly & deletion anomaly.
--> 1NF
--> 2NF
--> 3NF
--> BCNF.
INTRODUCTION
3NF and BCNF
Decomposition requirements
Lossless join decomposition
Dependency preserving decomposition
Disk pack features
Records and Files
Ordered and Unordered files
2NF,NF,3NF,BCNF
Normalization (Brief Overview)
Functional Dependencies and Keys
1st Normal Form
2nd Normal Form
3rdNormal Form
3.5 Normal Form (Boyce Codd Normal Form-BCNF)
4thNormal Form
5thNormal Form(Project-Join Normal Form-PJNF)
Domain Key Normal Form (DKNF)
6thNormal Form
Normalization and three normal forms.pptxZoha681526
Normalisation dbms computer science computer technology computer networks BCA bachelor of computer applications, normalisation is used to reduce the redundancy to avoid the anamolies there are 5 types of normal forms 1st normal form, second normal form, third normal form, boyce codd normal form, 4th normal form and fifth normal form
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
INTRODUCTION
3NF and BCNF
Decomposition requirements
Lossless join decomposition
Dependency preserving decomposition
Disk pack features
Records and Files
Ordered and Unordered files
2NF,NF,3NF,BCNF
Normalization (Brief Overview)
Functional Dependencies and Keys
1st Normal Form
2nd Normal Form
3rdNormal Form
3.5 Normal Form (Boyce Codd Normal Form-BCNF)
4thNormal Form
5thNormal Form(Project-Join Normal Form-PJNF)
Domain Key Normal Form (DKNF)
6thNormal Form
Normalization and three normal forms.pptxZoha681526
Normalisation dbms computer science computer technology computer networks BCA bachelor of computer applications, normalisation is used to reduce the redundancy to avoid the anamolies there are 5 types of normal forms 1st normal form, second normal form, third normal form, boyce codd normal form, 4th normal form and fifth normal form
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
3. Normalization
3
Normalization: the process of decomposing
unsatisfactory "bad" relations by breaking up their
attributes into smaller relations
Normalization is carried out in practice so that the
resulting designs are of high quality and meet the
desirable properties
The main goal of Database Normalization is to
restructure the logical data model of a database to:
Eliminate redundancy
Organize data efficiently
Reduce the potential for data anomalies
4. Data Anomalies
Mixing attributes of multiple entities may cause
problems
Information is stored redundantly wasting storage
Well structured relations contain minimal redundancy of data
They allow modification, insertion and deletion of data in the
relation without error
Data Anomalies are errors/inconsistencies that arise due to
redundantly stored data in a relation
The three most common anomalies in relational database
design are:
Insertion anomalies
Deletion anomalies
Modification anomalies (update anomalies)
4
6. Data Anomalies: Insertion Anomalies
These type of data anomalies occur when we try to
insert new records to a relation.
Insertion anomalies can be differentiated into two
types:
6
7. Data Anomalies: Insertion Anomalies
7
1.To insert a new employee tuple into EMP_DEPT, we must
include either the attribute values for the department that the
employee works for, or nulls (if the employee does not work
for a department as yet)
2. It is difficult to insert a new department that has no
employees as yet in the EMP_DEPT relation.
The only way to do this is to place null values in the
attributes for employee.
This causes a problem because SSN is the primary key of
EMP_DEPT, and each tuple is supposed to represent an
employee entity-not a department entity
Moreover, when the first employee is assigned to that
department, we do not need this tuple with null values
any more.
8. Data Anomalies: Deletion anomalies
E.g: If we delete from EMP_DEPT an employee tuple that
happens to represent the last employee working for a
particular department, the information concerning that
department is lost from the database
These type of anomalies occur when critical data has been
unintentionally (perhaps) removed from the database
8
9. Data Anomalies: Modification/Update Anomalies
These anomalies arise when the database must make
multiple changes on records to reflect a single attribute
change
Example:
In EMP_DEPT, if we change the value of one of the
attributes of a particular department-say, the manager of
department 5-we must update the tuples of all employees
who work in that department; otherwise, the database will
become inconsistent.
If we fail to update some tuples, the same department will
be shown to have two different values for manager in
different employee tuples, which would be wrong
9
10. Practical Use of Normal Forms
10
Normal form: Condition using keys and FDs of a
relation to certify whether a relation schema is in a
particular normal form
The practical utility of these normal forms becomes
questionable when the constraints on which they are
based are hard to understand or to detect
Denormalization: the process of storing the join of
higher normal form relations as a base relation—
which is in a lower normal form
11. Normalization and Normal Forms
The normalization process, as first proposed by Codd (l972a),
takes a relation schema through a series of tests to "certify"
whether it satisfies a certain normal form.
Normalization helps to:
Eliminate redundancy
Organize data efficiently
Reduce the potential for anomalies during data operations,
and
Improve data consistency
11
12. Normalization and Normal Forms
In the relational model, methods exist for quantifying how
efficient a database is.
These classifications are called normal forms (or NF), and
there are algorithms for converting a given database between
them
Edgar F. Codd originally established three normal forms:
1NF
2NF and
3NF
Later, others like BCNF, 4NF and 5NF were introduced and
were generally accepted, but 3NF is widely considered to be
sufficient for most applications
Most tables when reaching 3NF are also in BCNF (Boyce-
Codd Normal Form)
12
13. Normal Forms: First Normal Form (1NF)
A relation (table) R is in 1NF if and only if all underlying domains of
attributes contain only atomic values (simple/non divisible)
Each attribute must be atomic
• No repeating columns within a row
• No multi-valued columns.
1NF simplifies non atomic attributes
• Queries become easier
Normalization (Decomposition)
There are three options to normalize a relation into 1NF (as discussed
in the next slide) but the best option is to form new relation for each
non-atomic attribute or nested relations
Example: Employee Relation ( un normalized)
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C, Perl, Java
2 Barbara Jones 224 IT Linux, Mac
3 Jake Rivera 201 R&D DB2, Oracle, Java
13
14. Normal Forms: First Normal Form (1NF)
There are three techniques to achieve a 1NF for such relation:
Expand the key so that there will be a separate tuple in the original Employee
relation for each Skill of Employee. But this option has the disadvantage of
introducing redundancy in the relation
Remove the attribute Skills that violates 1NF and place it in a separate relation
EMP_SKILLS along with the primary key Emp_no of Employee
This decomposes the non-1NF relation into two 1NFrelations with the
following Schemas:
Employee (emp_no,name,dept_no,dept_name)
Emp_Skills (emp_no,skills)
If a maximum number of values is known for the non-atomic attribute-for
example, if it is known that at most three skills can exist for an employee-
replace the Skills attribute by three atomic attributes: Skill1,Skill2 and
Skill3. But this has the disadvantage of introducing null values if some
employees has less than three skills
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
14
15. Normal Forms: Second Normal Form (2NF)
Second normal form (2NF) is based on the concept of full
functional dependency. A functional dependency XY is a
full functional dependency if removal of any attribute A from X
means that the dependency does not hold any more;
The test for 2NF involves testing for functional
dependencies whose left-hand side attributes are part of
the primary key. If the primary key contains a single
attribute, the test need not be applied at all
15
16. Example: Consider Employee -Project Relation schema below
The relation is in 1NF
But the functional dependencies FD2 and FD3 make ENAME, PNAME, and
PLOCATION partially dependent on the primary key {SSN, PNUMBER} of
EMP_PROJ, thus violating the 2NF test.
Normalizing the relation into 2NF hence leads to the decomposition of
EMP_PROJ into the three relation schemas EPl, EP2, and EP3 as shown below:
Normal Forms: Second Normal Form (2NF)
16
17. Normal Forms: Third Normal Form (3NF)
Third normal form (3NF) is based on the concept of
transitive dependency
A functional dependency XY in a relation schema R is a
transitive dependency if there is a set of attributes Z that is
neither a candidate key nor a subset of any key of R, and
both XZ and ZY hold.
Example:
The dependency SSNDMGRSSN is transitive through
DNUMBER in EMP_DEP
17
18. Normal Forms: Third Normal Form (3NF)
Example:
The relation EMP_DEPT is in 2NF since no partial dependencies on a key exist
However, EMP_DEPT is not in 3NF because of the transitive dependency of
DMGRSSN (and also DNAME) on SSN via DNUMBER.
We can normalize EMP_DEPT by decomposing it into the two 3NF relation
schemas ED1 and ED2 as shown below
18
20. Denormalization
Normalization is performed to reduce or eliminate Insertion, Deletion or
Update anomalies
However, a completely normalized database may not be the most
efficient or effective implementation
“Denormalization” is sometimes used to improve efficiency.
Denormalization
Is the process of selectively taking normalized tables and re-combining
the data in them
Usually driven by the need to improve query speed.
20
21. Normalization
Improves maintenance for database changes
Tends to slow down retrieval
Better at finding problems than solving them
Standard normalization procedures are subtle and
may introduce BCNF or 4NF problems into tables
22. Intuitive(Accepted) by Normalization
1NF Tables represent entities
2NF Each table represents only one entity
3NF Tables do not contain attributes from embedded
entities
4NF Triple relationships should not represent a pair
of dual relationships
23. Exercise
1. Given the Grade report relation below and its functional dependencies,
normalize the relation
Gradereport (StudNo, StudName, Major, Advisor, CourseNo, Ctitle, InstName,
InstrucLocn, Grade)
Functional Dependencies:
• StudNo -> StudName
• CourseNo -> Ctitle, InstrucName
• InstrucName -> InstrucLocn
• StudNo, CourseNo, Major -> Grade
• StudNo, Major -> Advisor
• Advisor -> Major
23
25. Example: Company Database
The COMPANY database keeps track of a company's employees, departments,
and projects. Suppose that after the requirements collection and analysis phase,
the database designers provided the following description of the the part of the
company to be represented in the database:
The company is organized into departments. Each department has a unique
name, a unique number, and a particular employee who manages the department.
We keep track of the start date when that employee began managing the
department. A department may have several locations.
A department controls a number of projects, each of which has a unique name, a
unique number, and a single location
We store each employee's name, social security number, address, salary, sex, and
birth date. An employee is assigned to one department but may work on several
projects, which are not necessarily controlled by the same department. We keep
track of the number of hours per week that an employee works on each project.
We also keep track of the direct supervisor of each employee
We want to keep track of the dependents of each employee for insurance pur-
poses. We keep each dependent's first name, sex, birth date, and relationship to
the employee.
25
28. Reading Assignments
1. Discuss the correspondences between the ER model constructs and the
relational model constructs. Show how each ER model construct can be
mapped to the relational model, and discuss any alternative mappings
2. Discuss the options for mapping EER model constructs to relations.
3. Why should nulls in a relation be avoided as far as possible?
4. What does spurious tuples refer to? Discuss the problem of spurious
tuples and how we may prevent it
5. Discuss insertion, deletion, and modification anomalies. Why are they
considered bad? Illustrate with examples.
6. What does the term unnormalized relation refer to?
7. What undesirable dependencies are avoided when a relation is in 2NF?
8. What undesirable dependencies are avoided when a relation is in 3NF?
9. Define Boyce-Codd normal form. How does it differ from 3NF?Why is it
considered a stronger form of 3NF?
28
Normalization splits database information across multiple tables.
To retrieve complete information from a normalized database, the JOIN operation must be used.
JOIN tends to be expensive in terms of processing time, and very large joins are very expensive.
Examples:
If we have transitive dependency in a relation, it means there are different entities in a relation