SlideShare a Scribd company logo
1 of 39
+
Review: Normalization and data anomalies
CSCI 2141 W2013
Slide set modified from courses.ischool.berkeley.edu/i257/f06/.../Lecture06_257.ppt
Housekeeping
Assignment 3 on normalization due by
9:35 am on Friday morning
MUST be emailed to the TA – the scanner
is located on the 1st floor by the help desk
Quiz 3 on normalization on Friday
IS 257 – Fall 2008
Normalization
Normalization theory is based on the
observation that relations with certain
properties are more effective in inserting,
updating and deleting data than other sets of
relations containing the same data
Normalization is a multi-step process
beginning with an “unnormalized”relation
 Hospital example from Atre, S. Data Base:
Structured Techniques for Design, Performance,
and Management.
IS 257 – Fall 2008
Normal Forms
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
IS 257 – Fall 2008
Normalization
IS 257 – Fall 2008
Boyce-
Codd and
Higher
Functional
dependency
of nonkey
attributes on
the primary
key - Atomic
values only
Full
Functional
dependency
of nonkey
attributes on
the primary
key
No transitive
dependency
between
nonkey
attributes
All
determinants
are candidate
keys - Single
multivalued
dependency
Functional Dependencies
 Functional dependencies (FDs) are
used to specify formal measures of the
"goodness" of relational designs
 FDs and keys are used to define
normal forms for relations
 FDs are constraints that are derived
from the meaning and
interrelationships of the data attributes
Functional Dependency
definition
 A set of attributes X functionally determines a set of
attributes Y if the value of X determines a unique value for Y
 X Y holds if whenever two tuples have the same value for
X, they must have the same value for Y
If t1[X]=t2[X], then t1[Y]=t2[Y] in any relation instance r(R)
 X  Y in R specifies a constraint on all relation instances
r(R)
 FDs are derived from the real-world constraints on the
attributes
Examples of FD constraints
Social Security Number determines
employee name
SSN  ENAME
Project Number determines project name
and location
PNUMBER  {PNAME, PLOCATION}
Employee SSN and project number
determines the hours per week that the
employee works on the project
{SSN, PNUMBER}  HOURS
Functional Dependencies
and Keys
An FD is a property of the attributes in
the schema R
The constraint must hold on every
relation instance r(R)
 If K is a key of R, then K functionally
determines all attributes in R (since we
never have two distinct tuples with
t1[K]=t2[K])
Inference Rules for FDs
Given a set of FDs F, we can infer additional
FDs that hold whenever the FDs in F hold
Armstrong's inference rules
A1. (Reflexive) If Y subset-of X, then X  Y
A2. (Augmentation) If X  Y, then XZ  YZ
(Notation: XZ stands for X U Z)
A3. (Transitive) If X  Y and Y  Z, then X  Z
A1, A2, A3 form a sound and complete set
of inference rules
Additional Useful Inference
Rules
Decomposition
 If X  YZ, then X  Y and X  Z
Union
 If X  Y and X  Z, then X  YZ
Psuedotransitivity
 If X  Y and WY  Z, then WX  Z
Closure of a set F of FDs is the set F+ of all
FDs that can be inferred from F
Introduction to
Normalization
Normalization: Process of decomposing
unsatisfactory "bad" relations by breaking up
their attributes into smaller relations
Normal form: Condition using keys and FDs
of a relation to certify whether a relation
schema is in a particular normal form
 2NF, 3NF, BCNF based on keys and FDs of a
relation schema
 4NF based on keys, multi-valued dependencies
Unnormalized Relations
 First step in normalization is to convert
the data into a two-dimensional table
 In unnormalized relations data can
repeat within a column
IS 257 – Fall 2008
Unnormalized Relation
IS 257 – Fall 2008
Patient # Surgeon # Surg. date Patient Name Patient Addr Surgeon Surgery Postop drug
Drug side effects
1111
145
311
Jan 1,
1995; June
12, 1995 John White
15 New St.
New York,
NY
Beth Little
Michael
Diamond
Gallstone
s removal;
Kidney
stones
removal
Penicillin,
none-
rash
none
1234
243
467
Apr 5,
1994 May
10, 1995 Mary Jones
10 Main St.
Rye, NY
Charles
Field
Patricia
Gold
Eye
Cataract
removal
Thrombos
is removal
Tetracyclin
e none
Fever
none
2345 189
Jan 8,
1996 Charles Brown
Dogwood
Lane
Harrison,
NY
David
Rosen
Open
Heart
Surgery
Cephalosp
orin none
4876 145
Nov 5,
1995 Hal Kane
55 Boston
Post Road,
Chester,
CN Beth Little
Cholecyst
ectomy Demicillin none
5123 145
May 10,
1995 Paul Kosher
Blind Brook
Mamaronec
k, NY Beth Little
Gallstone
s
Removal none none
6845 243
Apr 5,
1994 Dec
15, 1984 Ann Hood
Hilton Road
Larchmont,
NY
Charles
Field
Eye
Cornea
Replacem
ent Eye
cataract
removal
Tetracyclin
e Fever
First Normal Form
To move to First Normal Form a
relation must contain only atomic
values at each row and column.
No repeating groups
A column or set of columns is called a
Candidate Key when its values can
uniquely identify the row in the relation.
IS 257 – Fall 2008
First Normal Form
IS 257 – Fall 2008
Patient # Surgeon # Surgery DatePatient NamePatient Addr Surgeon Name Surgery Drug adminSide Effects
1111 145 01-Jan-95 John White
15 New St.
New York,
NY Beth Little
Gallstone
s removal Penicillin rash
1111 311 12-Jun-95 John White
15 New St.
New York,
NY
Michael
Diamond
Kidney
stones
removal none none
1234 243 05-Apr-94 Mary Jones
10 Main St.
Rye, NY Charles Field
Eye
Cataract
removal
Tetracyclin
e Fever
1234 467 10-May-95 Mary Jones
10 Main St.
Rye, NY Patricia Gold
Thrombos
is removal none none
2345 189 08-Jan-96
Charles
Brown
Dogwood
Lane
Harrison,
NY David Rosen
Open
Heart
Surgery
Cephalosp
orin none
4876 145 05-Nov-95 Hal Kane
55 Boston
Post Road,
Chester,
CN Beth Little
Cholecyst
ectomy Demicillin none
5123 145 10-May-95 Paul Kosher
Blind Brook
Mamaronec
k, NY Beth Little
Gallstone
s
Removal none none
6845 243 05-Apr-94 Ann Hood
Hilton Road
Larchmont,
NY Charles Field
Eye
Cornea
Replacem
ent
Tetracyclin
e Fever
6845 243 15-Dec-84 Ann Hood
Hilton Road
Larchmont,
NY Charles Field
Eye
cataract
removal none none
1NF Storage Anomalies
 Insertion: A new patient has not yet undergone surgery --
hence no surgeon # -- Since surgeon # is part of the key, we
cannot insert.
 Insertion: If a surgeon is newly hired and has not operated
yet -- there will be no way to include that person in the
database.
 Update: If a patient comes in for a new procedure, and has
moved, we need to change multiple address entries.
 Deletion (type 1): Deleting a patient record may also delete
all info about a surgeon.
 Deletion (type 2): When there are functional dependencies
(like side effects and drug) changing one item eliminates
other information.
IS 257 – Fall 2008
Second Normal Form
 A relation is said to be in Second
Normal Form when every non-key
attribute is fully functionally
dependent on the primary key.
 That is, every non-key attribute needs the
full primary key for unique identification
IS 257 – Fall 2008
Why is this not in 2NF?
IS 257 – Fall 2008
Patient # Surgeon # Surgery DatePatient NamePatient Addr Surgeon Name Surgery Drug adminSide Effects
1111 145 01-Jan-95 John White
15 New St.
New York,
NY Beth Little
Gallstone
s removal Penicillin rash
1111 311 12-Jun-95 John White
15 New St.
New York,
NY
Michael
Diamond
Kidney
stones
removal none none
1234 243 05-Apr-94 Mary Jones
10 Main St.
Rye, NY Charles Field
Eye
Cataract
removal
Tetracyclin
e Fever
1234 467 10-May-95 Mary Jones
10 Main St.
Rye, NY Patricia Gold
Thrombos
is removal none none
2345 189 08-Jan-96
Charles
Brown
Dogwood
Lane
Harrison,
NY David Rosen
Open
Heart
Surgery
Cephalosp
orin none
4876 145 05-Nov-95 Hal Kane
55 Boston
Post Road,
Chester,
CN Beth Little
Cholecyst
ectomy Demicillin none
5123 145 10-May-95 Paul Kosher
Blind Brook
Mamaronec
k, NY Beth Little
Gallstone
s
Removal none none
6845 243 05-Apr-94 Ann Hood
Hilton Road
Larchmont,
NY Charles Field
Eye
Cornea
Replacem
ent
Tetracyclin
e Fever
6845 243 15-Dec-84 Ann Hood
Hilton Road
Larchmont,
NY Charles Field
Eye
cataract
removal none none
Second Normal Form
IS 257 – Fall 2008
Patient # Patient Name Patient Address
1111 John White
15 New St. New
York, NY
1234 Mary Jones
10 Main St. Rye,
NY
2345
Charles
Brown
Dogwood Lane
Harrison, NY
4876 Hal Kane
55 Boston Post
Road, Chester,
5123 Paul Kosher
Blind Brook
Mamaroneck, NY
6845 Ann Hood
Hilton Road
Larchmont, NY
Second Normal Form
IS 257 – Fall 2008
Surgeon # Surgeon Name
145 Beth Little
189 David Rosen
243 Charles Field
311 Michael Diamond
467 Patricia Gold
Second Normal Form
IS 257 – Fall 2008
Patient # Surgeon # Surgery Date Surgery Drug Admin Side Effects
1111 145 01-Jan-95
Gallstones
removal Penicillin rash
1111 311 12-Jun-95
Kidney
stones
removal none none
1234 243 05-Apr-94
Eye Cataract
removal Tetracycline Fever
1234 467 10-May-95
Thrombosis
removal none none
2345 189 08-Jan-96
Open Heart
Surgery
Cephalospori
n none
4876 145 05-Nov-95
Cholecystect
omy Demicillin none
5123 145 10-May-95
Gallstones
Removal none none
6845 243 15-Dec-84
Eye cataract
removal none none
6845 243 05-Apr-94
Eye Cornea
Replacement Tetracycline Fever
1NF Storage Anomalies Removed
 Insertion: Can now enter new patients without
surgery.
 Insertion: Can now enter Surgeons who have
not operated.
 Deletion (type 1): If Charles Brown dies, the
corresponding tuples from Patient and Surgery
tables can be deleted without losing information
on David Rosen.
 Update: If John White comes in for third time,
and has moved, we only need to change the
Patient table
IS 257 – Fall 2008
2NF Storage Anomalies
 Insertion: Cannot enter the fact that a particular
drug has a particular side effect unless it is given
to a patient.
 Deletion: If John White receives some other
drug because of the penicillin rash, and a new
drug and side effect are entered, we lose the
information that penicillin can cause a rash
 Update: If drug side effects change (a new
formula) we have to update multiple occurrences
of side effects.
IS 257 – Fall 2008
Third Normal Form
 A relation is said to be in Third Normal Form
if there is no transitive functional dependency
between non-key attributes
 When one non-key attribute can be determined
with one or more non-key attributes there is said to
be a transitive functional dependency.
 The side effect column in the Surgery table
is determined by the drug administered
 Side effect is transitively functionally dependent on
drug so Surgery is not 3NF
IS 257 – Fall 2008
Why is this not in 3NF?
IS 257 – Fall 2008
Patient # Surgeon # Surgery Date Surgery Drug Admin Side Effects
1111 145 01-Jan-95
Gallstones
removal Penicillin rash
1111 311 12-Jun-95
Kidney
stones
removal none none
1234 243 05-Apr-94
Eye Cataract
removal Tetracycline Fever
1234 467 10-May-95
Thrombosis
removal none none
2345 189 08-Jan-96
Open Heart
Surgery
Cephalospori
n none
4876 145 05-Nov-95
Cholecystect
omy Demicillin none
5123 145 10-May-95
Gallstones
Removal none none
6845 243 15-Dec-84
Eye cataract
removal none none
6845 243 05-Apr-94
Eye Cornea
Replacement Tetracycline Fever
Third Normal Form
IS 257 – Fall 2008
Patient # Surgeon # Surgery Date Surgery Drug Admin
1111 145 01-Jan-95 Gallstones removal Penicillin
1111 311 12-Jun-95
Kidney stones
removal none
1234 243 05-Apr-94 Eye Cataract removal Tetracycline
1234 467 10-May-95 Thrombosis removal none
2345 189 08-Jan-96 Open Heart Surgery Cephalosporin
4876 145 05-Nov-95 Cholecystectomy Demicillin
5123 145 10-May-95 Gallstones Removal none
6845 243 15-Dec-84 Eye cataract removal none
6845 243 05-Apr-94
Eye Cornea
Replacement Tetracycline
Third Normal Form
IS 257 – Fall 2008
Drug Admin Side Effects
Cephalosporin none
Demicillin none
none none
Penicillin rash
Tetracycline Fever
2NF Storage Anomalies Removed
Insertion: We can now enter the fact
that a particular drug has a particular
side effect in the Drug relation.
Deletion: If John White receives some
other drug as a result of the rash from
penicillin, the information on penicillin
and rash is maintained.
Update: The side effects for each drug
appear only once.
IS 257 – Fall 2008
Boyce-Codd Normal Form
 Most 3NF relations are also BCNF
relations.
A 3NF relation is NOT in BCNF if:
Candidate keys in the relation are
composite keys (they are not single
attributes)
There is more than one candidate key in
the relation, and
The keys are not disjoint, that is, some
attributes in the keys are common
IS 257 – Fall 2008
Fourth Normal Form
 Any relation is in Fourth Normal Form
if it is BCNF and any multivalued
dependencies are trivial
 Eliminate non-trivial multivalued
dependencies by projecting into simpler
tables
IS 257 – Fall 2008
Fifth Normal Form
A relation is in 5NF if every join
dependency in the relation is implied by
the keys of the relation
 Implies that relations that have been
decomposed in previous normal forms
can be recombined via natural joins to
recreate the original relation.
IS 257 – Fall 2008
Effectiveness and Efficiency Issues for
DBMS
 Focus on the relational model
 Any column in a relational database can
be searched for values.
 To improve efficiency indexes using
storage structures such as BTrees and
Hashing are used
 But many useful functions are not
indexable and require complete scans of
the the database
IS 257 – Fall 2008
Example: Text Fields
 In conventional RDBMS, when a text
field is indexed, only exact matching of
the text field contents (or Greater-than
and Less-than).
 Can search for individual words using
pattern matching, but a full scan is required.
 Text searching is still done best (and
fastest) by specialized text search
programs (Search Engines)
IS 257 – Fall 2008
Normalization
 Normalization is performed to reduce
or eliminate Insertion, Deletion or
Update anomalies.
 However, a completely normalized
database may not be the most efficient
or effective implementation.
“Denormalization” is sometimes
used to improve efficiency.
IS 257 – Fall 2008
Normalizing to death
Normalization splits database
information across multiple tables.
To retrieve complete information from a
normalized database, the JOIN
operation must be used.
JOIN tends to be expensive in terms of
processing time, and very large joins
are very expensive.
IS 257 – Fall 2008
Downward Denormalization
IS 257 – Fall 2008
Customer
ID
Address
Name
Telephone
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
Before:
Customer
ID
Address
Name
Telephone
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
Cust Name
After:
Upward Denormalization
IS 257 – Fall 2008
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
Cust Name
Order Price
Order Item
Order No
Item No
Item Price
Num Ordered
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
Cust Name
Order Item
Order No
Item No
Item Price
Num Ordered
Denormalization
 Usually driven by the need to improve
query speed
 Query speed is improved at the
expense of more complex or
problematic DML (Data manipulation
language) for updates, deletions and
insertions.
IS 257 – Fall 2008

More Related Content

Recently uploaded

Online crime reporting system project.pdf
Online crime reporting system project.pdfOnline crime reporting system project.pdf
Online crime reporting system project.pdfKamal Acharya
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfJNTUA
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2T.D. Shashikala
 
Geometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfGeometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfJNTUA
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualBalamuruganV28
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1T.D. Shashikala
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..MaherOthman7
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxMANASINANDKISHORDEOR
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxMustafa Ahmed
 
AI in Healthcare Innovative use cases and applications.pdf
AI in Healthcare Innovative use cases and applications.pdfAI in Healthcare Innovative use cases and applications.pdf
AI in Healthcare Innovative use cases and applications.pdfmahaffeycheryld
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashidFaiyazSheikh
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxRashidFaridChishti
 
21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological universityMohd Saifudeen
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.MdManikurRahman
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxkalpana413121
 
Passive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptPassive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptamrabdallah9
 
Basics of Relay for Engineering Students
Basics of Relay for Engineering StudentsBasics of Relay for Engineering Students
Basics of Relay for Engineering Studentskannan348865
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxMustafa Ahmed
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfEr.Sonali Nasikkar
 
Introduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AIIntroduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AISheetal Jain
 

Recently uploaded (20)

Online crime reporting system project.pdf
Online crime reporting system project.pdfOnline crime reporting system project.pdf
Online crime reporting system project.pdf
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2
 
Geometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfGeometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdf
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptx
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 
AI in Healthcare Innovative use cases and applications.pdf
AI in Healthcare Innovative use cases and applications.pdfAI in Healthcare Innovative use cases and applications.pdf
AI in Healthcare Innovative use cases and applications.pdf
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docx
 
21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Passive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptPassive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.ppt
 
Basics of Relay for Engineering Students
Basics of Relay for Engineering StudentsBasics of Relay for Engineering Students
Basics of Relay for Engineering Students
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptx
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
Introduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AIIntroduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AI
 

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Normalization review

  • 1. + Review: Normalization and data anomalies CSCI 2141 W2013 Slide set modified from courses.ischool.berkeley.edu/i257/f06/.../Lecture06_257.ppt
  • 2. Housekeeping Assignment 3 on normalization due by 9:35 am on Friday morning MUST be emailed to the TA – the scanner is located on the 1st floor by the help desk Quiz 3 on normalization on Friday IS 257 – Fall 2008
  • 3. Normalization Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data Normalization is a multi-step process beginning with an “unnormalized”relation  Hospital example from Atre, S. Data Base: Structured Techniques for Design, Performance, and Management. IS 257 – Fall 2008
  • 4. Normal Forms First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) IS 257 – Fall 2008
  • 5. Normalization IS 257 – Fall 2008 Boyce- Codd and Higher Functional dependency of nonkey attributes on the primary key - Atomic values only Full Functional dependency of nonkey attributes on the primary key No transitive dependency between nonkey attributes All determinants are candidate keys - Single multivalued dependency
  • 6. Functional Dependencies  Functional dependencies (FDs) are used to specify formal measures of the "goodness" of relational designs  FDs and keys are used to define normal forms for relations  FDs are constraints that are derived from the meaning and interrelationships of the data attributes
  • 7. Functional Dependency definition  A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y  X Y holds if whenever two tuples have the same value for X, they must have the same value for Y If t1[X]=t2[X], then t1[Y]=t2[Y] in any relation instance r(R)  X  Y in R specifies a constraint on all relation instances r(R)  FDs are derived from the real-world constraints on the attributes
  • 8. Examples of FD constraints Social Security Number determines employee name SSN  ENAME Project Number determines project name and location PNUMBER  {PNAME, PLOCATION} Employee SSN and project number determines the hours per week that the employee works on the project {SSN, PNUMBER}  HOURS
  • 9. Functional Dependencies and Keys An FD is a property of the attributes in the schema R The constraint must hold on every relation instance r(R)  If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K])
  • 10. Inference Rules for FDs Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold Armstrong's inference rules A1. (Reflexive) If Y subset-of X, then X  Y A2. (Augmentation) If X  Y, then XZ  YZ (Notation: XZ stands for X U Z) A3. (Transitive) If X  Y and Y  Z, then X  Z A1, A2, A3 form a sound and complete set of inference rules
  • 11. Additional Useful Inference Rules Decomposition  If X  YZ, then X  Y and X  Z Union  If X  Y and X  Z, then X  YZ Psuedotransitivity  If X  Y and WY  Z, then WX  Z Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F
  • 12. Introduction to Normalization Normalization: Process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form  2NF, 3NF, BCNF based on keys and FDs of a relation schema  4NF based on keys, multi-valued dependencies
  • 13. Unnormalized Relations  First step in normalization is to convert the data into a two-dimensional table  In unnormalized relations data can repeat within a column IS 257 – Fall 2008
  • 14. Unnormalized Relation IS 257 – Fall 2008 Patient # Surgeon # Surg. date Patient Name Patient Addr Surgeon Surgery Postop drug Drug side effects 1111 145 311 Jan 1, 1995; June 12, 1995 John White 15 New St. New York, NY Beth Little Michael Diamond Gallstone s removal; Kidney stones removal Penicillin, none- rash none 1234 243 467 Apr 5, 1994 May 10, 1995 Mary Jones 10 Main St. Rye, NY Charles Field Patricia Gold Eye Cataract removal Thrombos is removal Tetracyclin e none Fever none 2345 189 Jan 8, 1996 Charles Brown Dogwood Lane Harrison, NY David Rosen Open Heart Surgery Cephalosp orin none 4876 145 Nov 5, 1995 Hal Kane 55 Boston Post Road, Chester, CN Beth Little Cholecyst ectomy Demicillin none 5123 145 May 10, 1995 Paul Kosher Blind Brook Mamaronec k, NY Beth Little Gallstone s Removal none none 6845 243 Apr 5, 1994 Dec 15, 1984 Ann Hood Hilton Road Larchmont, NY Charles Field Eye Cornea Replacem ent Eye cataract removal Tetracyclin e Fever
  • 15. First Normal Form To move to First Normal Form a relation must contain only atomic values at each row and column. No repeating groups A column or set of columns is called a Candidate Key when its values can uniquely identify the row in the relation. IS 257 – Fall 2008
  • 16. First Normal Form IS 257 – Fall 2008 Patient # Surgeon # Surgery DatePatient NamePatient Addr Surgeon Name Surgery Drug adminSide Effects 1111 145 01-Jan-95 John White 15 New St. New York, NY Beth Little Gallstone s removal Penicillin rash 1111 311 12-Jun-95 John White 15 New St. New York, NY Michael Diamond Kidney stones removal none none 1234 243 05-Apr-94 Mary Jones 10 Main St. Rye, NY Charles Field Eye Cataract removal Tetracyclin e Fever 1234 467 10-May-95 Mary Jones 10 Main St. Rye, NY Patricia Gold Thrombos is removal none none 2345 189 08-Jan-96 Charles Brown Dogwood Lane Harrison, NY David Rosen Open Heart Surgery Cephalosp orin none 4876 145 05-Nov-95 Hal Kane 55 Boston Post Road, Chester, CN Beth Little Cholecyst ectomy Demicillin none 5123 145 10-May-95 Paul Kosher Blind Brook Mamaronec k, NY Beth Little Gallstone s Removal none none 6845 243 05-Apr-94 Ann Hood Hilton Road Larchmont, NY Charles Field Eye Cornea Replacem ent Tetracyclin e Fever 6845 243 15-Dec-84 Ann Hood Hilton Road Larchmont, NY Charles Field Eye cataract removal none none
  • 17. 1NF Storage Anomalies  Insertion: A new patient has not yet undergone surgery -- hence no surgeon # -- Since surgeon # is part of the key, we cannot insert.  Insertion: If a surgeon is newly hired and has not operated yet -- there will be no way to include that person in the database.  Update: If a patient comes in for a new procedure, and has moved, we need to change multiple address entries.  Deletion (type 1): Deleting a patient record may also delete all info about a surgeon.  Deletion (type 2): When there are functional dependencies (like side effects and drug) changing one item eliminates other information. IS 257 – Fall 2008
  • 18. Second Normal Form  A relation is said to be in Second Normal Form when every non-key attribute is fully functionally dependent on the primary key.  That is, every non-key attribute needs the full primary key for unique identification IS 257 – Fall 2008
  • 19. Why is this not in 2NF? IS 257 – Fall 2008 Patient # Surgeon # Surgery DatePatient NamePatient Addr Surgeon Name Surgery Drug adminSide Effects 1111 145 01-Jan-95 John White 15 New St. New York, NY Beth Little Gallstone s removal Penicillin rash 1111 311 12-Jun-95 John White 15 New St. New York, NY Michael Diamond Kidney stones removal none none 1234 243 05-Apr-94 Mary Jones 10 Main St. Rye, NY Charles Field Eye Cataract removal Tetracyclin e Fever 1234 467 10-May-95 Mary Jones 10 Main St. Rye, NY Patricia Gold Thrombos is removal none none 2345 189 08-Jan-96 Charles Brown Dogwood Lane Harrison, NY David Rosen Open Heart Surgery Cephalosp orin none 4876 145 05-Nov-95 Hal Kane 55 Boston Post Road, Chester, CN Beth Little Cholecyst ectomy Demicillin none 5123 145 10-May-95 Paul Kosher Blind Brook Mamaronec k, NY Beth Little Gallstone s Removal none none 6845 243 05-Apr-94 Ann Hood Hilton Road Larchmont, NY Charles Field Eye Cornea Replacem ent Tetracyclin e Fever 6845 243 15-Dec-84 Ann Hood Hilton Road Larchmont, NY Charles Field Eye cataract removal none none
  • 20. Second Normal Form IS 257 – Fall 2008 Patient # Patient Name Patient Address 1111 John White 15 New St. New York, NY 1234 Mary Jones 10 Main St. Rye, NY 2345 Charles Brown Dogwood Lane Harrison, NY 4876 Hal Kane 55 Boston Post Road, Chester, 5123 Paul Kosher Blind Brook Mamaroneck, NY 6845 Ann Hood Hilton Road Larchmont, NY
  • 21. Second Normal Form IS 257 – Fall 2008 Surgeon # Surgeon Name 145 Beth Little 189 David Rosen 243 Charles Field 311 Michael Diamond 467 Patricia Gold
  • 22. Second Normal Form IS 257 – Fall 2008 Patient # Surgeon # Surgery Date Surgery Drug Admin Side Effects 1111 145 01-Jan-95 Gallstones removal Penicillin rash 1111 311 12-Jun-95 Kidney stones removal none none 1234 243 05-Apr-94 Eye Cataract removal Tetracycline Fever 1234 467 10-May-95 Thrombosis removal none none 2345 189 08-Jan-96 Open Heart Surgery Cephalospori n none 4876 145 05-Nov-95 Cholecystect omy Demicillin none 5123 145 10-May-95 Gallstones Removal none none 6845 243 15-Dec-84 Eye cataract removal none none 6845 243 05-Apr-94 Eye Cornea Replacement Tetracycline Fever
  • 23. 1NF Storage Anomalies Removed  Insertion: Can now enter new patients without surgery.  Insertion: Can now enter Surgeons who have not operated.  Deletion (type 1): If Charles Brown dies, the corresponding tuples from Patient and Surgery tables can be deleted without losing information on David Rosen.  Update: If John White comes in for third time, and has moved, we only need to change the Patient table IS 257 – Fall 2008
  • 24. 2NF Storage Anomalies  Insertion: Cannot enter the fact that a particular drug has a particular side effect unless it is given to a patient.  Deletion: If John White receives some other drug because of the penicillin rash, and a new drug and side effect are entered, we lose the information that penicillin can cause a rash  Update: If drug side effects change (a new formula) we have to update multiple occurrences of side effects. IS 257 – Fall 2008
  • 25. Third Normal Form  A relation is said to be in Third Normal Form if there is no transitive functional dependency between non-key attributes  When one non-key attribute can be determined with one or more non-key attributes there is said to be a transitive functional dependency.  The side effect column in the Surgery table is determined by the drug administered  Side effect is transitively functionally dependent on drug so Surgery is not 3NF IS 257 – Fall 2008
  • 26. Why is this not in 3NF? IS 257 – Fall 2008 Patient # Surgeon # Surgery Date Surgery Drug Admin Side Effects 1111 145 01-Jan-95 Gallstones removal Penicillin rash 1111 311 12-Jun-95 Kidney stones removal none none 1234 243 05-Apr-94 Eye Cataract removal Tetracycline Fever 1234 467 10-May-95 Thrombosis removal none none 2345 189 08-Jan-96 Open Heart Surgery Cephalospori n none 4876 145 05-Nov-95 Cholecystect omy Demicillin none 5123 145 10-May-95 Gallstones Removal none none 6845 243 15-Dec-84 Eye cataract removal none none 6845 243 05-Apr-94 Eye Cornea Replacement Tetracycline Fever
  • 27. Third Normal Form IS 257 – Fall 2008 Patient # Surgeon # Surgery Date Surgery Drug Admin 1111 145 01-Jan-95 Gallstones removal Penicillin 1111 311 12-Jun-95 Kidney stones removal none 1234 243 05-Apr-94 Eye Cataract removal Tetracycline 1234 467 10-May-95 Thrombosis removal none 2345 189 08-Jan-96 Open Heart Surgery Cephalosporin 4876 145 05-Nov-95 Cholecystectomy Demicillin 5123 145 10-May-95 Gallstones Removal none 6845 243 15-Dec-84 Eye cataract removal none 6845 243 05-Apr-94 Eye Cornea Replacement Tetracycline
  • 28. Third Normal Form IS 257 – Fall 2008 Drug Admin Side Effects Cephalosporin none Demicillin none none none Penicillin rash Tetracycline Fever
  • 29. 2NF Storage Anomalies Removed Insertion: We can now enter the fact that a particular drug has a particular side effect in the Drug relation. Deletion: If John White receives some other drug as a result of the rash from penicillin, the information on penicillin and rash is maintained. Update: The side effects for each drug appear only once. IS 257 – Fall 2008
  • 30. Boyce-Codd Normal Form  Most 3NF relations are also BCNF relations. A 3NF relation is NOT in BCNF if: Candidate keys in the relation are composite keys (they are not single attributes) There is more than one candidate key in the relation, and The keys are not disjoint, that is, some attributes in the keys are common IS 257 – Fall 2008
  • 31. Fourth Normal Form  Any relation is in Fourth Normal Form if it is BCNF and any multivalued dependencies are trivial  Eliminate non-trivial multivalued dependencies by projecting into simpler tables IS 257 – Fall 2008
  • 32. Fifth Normal Form A relation is in 5NF if every join dependency in the relation is implied by the keys of the relation  Implies that relations that have been decomposed in previous normal forms can be recombined via natural joins to recreate the original relation. IS 257 – Fall 2008
  • 33. Effectiveness and Efficiency Issues for DBMS  Focus on the relational model  Any column in a relational database can be searched for values.  To improve efficiency indexes using storage structures such as BTrees and Hashing are used  But many useful functions are not indexable and require complete scans of the the database IS 257 – Fall 2008
  • 34. Example: Text Fields  In conventional RDBMS, when a text field is indexed, only exact matching of the text field contents (or Greater-than and Less-than).  Can search for individual words using pattern matching, but a full scan is required.  Text searching is still done best (and fastest) by specialized text search programs (Search Engines) IS 257 – Fall 2008
  • 35. Normalization  Normalization is performed to reduce or eliminate Insertion, Deletion or Update anomalies.  However, a completely normalized database may not be the most efficient or effective implementation. “Denormalization” is sometimes used to improve efficiency. IS 257 – Fall 2008
  • 36. Normalizing to death Normalization splits database information across multiple tables. To retrieve complete information from a normalized database, the JOIN operation must be used. JOIN tends to be expensive in terms of processing time, and very large joins are very expensive. IS 257 – Fall 2008
  • 37. Downward Denormalization IS 257 – Fall 2008 Customer ID Address Name Telephone Order Order No Date Taken Date Dispatched Date Invoiced Cust ID Before: Customer ID Address Name Telephone Order Order No Date Taken Date Dispatched Date Invoiced Cust ID Cust Name After:
  • 38. Upward Denormalization IS 257 – Fall 2008 Order Order No Date Taken Date Dispatched Date Invoiced Cust ID Cust Name Order Price Order Item Order No Item No Item Price Num Ordered Order Order No Date Taken Date Dispatched Date Invoiced Cust ID Cust Name Order Item Order No Item No Item Price Num Ordered
  • 39. Denormalization  Usually driven by the need to improve query speed  Query speed is improved at the expense of more complex or problematic DML (Data manipulation language) for updates, deletions and insertions. IS 257 – Fall 2008