The document discusses the process of normalization to organize a relational database and reduce data redundancy. It uses a sample dataset on shipments to demonstrate the three normal forms - first normal form ensures each attribute contains a single value, second normal form splits attributes that do not depend on the full primary key, and third normal form makes all non-key attributes independent of each other. The normalization process transforms the sample shipment data through each normal form to arrive at a less redundant and more logically organized database structure.
Normalization of shipment data into 1NF and 2NF tables
1. Career Point University,
Kota (Raj.)
Minor 2 Assignment
Topic : Normalization
Presented To: Presented By:
Rohit Singh Wakil Kumar
Assistant Professor UID- K12467
CSE Dept. B.Tech (CSE)
4th Sem / 2nd Year
2. Normalization
In relational database design,normalizationisthe processof organizingdatatominimize redundancy.
Normalizationusuallyinvolvesdividinga database intotwoor more tables and definingrelationships
betweenthe tables.The objective istoisolate datasothat additions,deletions,andmodificationsof a
fieldcanbe made in justone table andthenpropagatedthroughthe restof the database viathe defined
relationships.
There are three main normal forms, each with increasing levels of normalization:
First Normal Form(1NF): Each field in a table contains different information. For
example, in an employee list, each table would contain only one birthdate field.
SecondNormal Form(2NF): No field values can be derived from another field. For
example, if a table already included a birthdate field, it could not also include a birth year field,
since this information would be redundant.
Third Normal Form (3NF): No duplicate information is permitted. So, for example, if two
tables both require a birthdate field, the birthdate information would be separated into a
separate table, and the two other tables would then access the birthdate information via an
index field in the birthdate table. Any change to a birthdate would automatically be reflect in all
tables that link to the birthdate table.
The underlying ideas in normalization are simple enough. Through normalization we want to
design for our relational database a set of files that (1) contain all the data necessary for the
purposes that the database is to serve, (2) have as little redundancy as possible, (3)
accommodate multiple values for types of data that require them, (4) permit efficient updates
of the data in the database, and (5) avoid the danger of losing data unknowingly.
Normalization can be viewed as a series of steps (i.e., levels) designed, one after another, to
deal with ways in which tables can be "too complicated for their own good". The purpose of
normalization is to reduce the chances for anomalies to occur in a database. The definitions of
the various levels of normalization illustrate complications to be eliminated in order to reduce
the chances of anomalies. At all levels and in every case of a table with a complication, the
resolution of the problem turns out to be the establishment of two or more simpler tables
which, as a group, contain the same information as the original table but which, because of
their simpler individual structures, lack the complication.
To explain this concept we will use a typical business set of data – that commonly found
in the shipment of goods.
3. The shipment of goods between a consignee (who gets the goods) and a consignor (who sends
them) is also known as consignment, delivery, movement or transport of goods. It typically
involves a third party organization who takes on the role of the freight forwarder. They manage
the scheduling of the required transport (ship, plane, train, truck, etc.) and may supply the
equipment necessary for efficient carriage of the goods. For example, customized containers
for the holds of aircraft or refrigerated sea containers for carrying perishable goods. The
contract of a shipment between the carrier and either the consignee or the consignor is often
established by a document known as a shipping or forwarding instruction or a bill, waybill or bill
of lading.
The following table (Table 1) gives three sample instances of data that may typically be
used for shipping. The first two rows refer to a consignment of goods being shipped by
sea and then road in a shipping container using two carriers. The third row relates to a
separate air freight shipment.
ContractID1 Transport
Mode2
CarrierID3 EquipmentID4 SizeTypeCode5 SealNumber6 SealIssuer7
PONL40078678 Sea P&ONL PONL12345
PONL34567
4040
4020
ABX123456
ABX123457
ABC
ABC
TNT9439287-5 Road TNT PONL34567 4020 ABX123457
GGDFG99
ABC
Customs
1 identifies a shipping contract that allows the supplier to ship goods under specific
freight conditions or the carrier to bill against a specific contract.
2 specifies the method or type of transportation of the shipment. Typically this may be
sea, air or road.
3 the identifier assigned by the agency to the carrier. This identifies the carrier being
used for this stage of the shipment.
4 identifies the information about one set of transport equipment related to the shipment.
The most common example of transport equipment is a shipping container.
5 the size and type of the transport equipment.
6 identifies the seal number of the equipment.
7 which party issues and is responsible for the seal.
4. 180-1234567 Air KE-
Korean
Air Cargo
KAL12345 747-K10 XXX664
Table 1 Shipment Table
We shall use this set of data to demonstrate the principles of normalization.
Of course, we can present this data in any number of formats. Here is an XML instantiation of the same
data.
<Shipment>
<ContractId>PONL40078678</ContractId>
<TransportModeId>Sea</TransportModeId>
<CarrierId>P&ONL</CarrierId>
<EquipmentId>PONL12345</EquipmentId>
<EquipmentId>PONL34567</EquipmentId>
<SizeTypeCode>4040</SizeTypeCode>
<SizeTypeCode>4020</SizeTypeCode>
<SealNumber>ABX123456</ SealNumber>
<SealIssuer>ABC</SealIssuer>
<SealNumber>ABX123457</ SealNumber>
<SealIssuer>ABC</SealIssuer>
</Shipment>
<Shipment>
<ContractId>TNT9439287-5</ContractId>
<TransportModeId>Road</TransportModeId>
<CarrierId>TNT</CarrierId>
<EquipmentId>PONL34567</EquipmentId>
<SizeTypeCode>4020</SizeTypeCode>
5. <SealNumber>ABX123457</ SealNumber>
<SealIssuer>ABC</SealIssuer>
<SealNumber>GGDFG99</SealNumber>
<SealIssuer>Customs</SealIssuer>
</Shipment>
<Shipment>
<ContractId>180-1234567</ContractId>
<TransportModeId>Air</TransportModeId>
<CarrierId>KE-KoreanAir Cargo</CarrierId>
<EquipmentId>KAL12345</EquipmentId>
<SizeTypeCode>747-K10</SizeTypeCode>
<SealNumber>XXX664</ SealNumber>
</Shipment>
The first thing to note is that the data present is ‘flat’ – we have one table/container called ‘shipment’ and
all attributes or nested elements sit within this one structure. When data is a single flat structure like this, it
is known as being in zero normal form (or de-normalized). The purpose of normalization is to put
structure or ‘depth’ into the data.
The secondthingto be aware of is the primarykey(orkeys) of our data. Withinanysetof data,one or
more valuesmaybe usedto uniquelyidentify aspecificinstance of anentry. For example,aContractID
may be usedto identifypreciselyone row inthe shipmenttable. Soif we have a ContractIDof
“PONL40078678” thenwe should findone,andonlyone,entrywiththisvalue.
However,sometimesasingle value maynotbe sufficientlyindividual todothis.Forexample,itis
possible fordifferentcarrierstouse the same identificationnumbersfortheircontracts. Technically,we
couldhave two“PONL40078678” ContractIDs,one for P&ONLand another for OOCL shipping. There is
no businessconventiontoguardagainstthis. So we may needboththe CarrierIDand the ContractIDto
be sure of uniqueness. Atthisstage,thisparticularissue wouldaddtothe complexityof ourexample,
so we will assume thatContractIDisgood enoughonitsownas a unique key.However,asalways,real
businesspractice shouldbe the guide forthese decisions. Itshouldsuffice tosaythatwhenwe talkof
keyswe meanthe ‘entire’keyorsetof valuesthatcan uniquelyidentifyasingle entryinourdata.
Withthisin mind,the firststepisto progressourdata intoFirst Normal Form.
6. First NormalForm
The aim of firstnormal form data isto ensure thatall of the attributesare discrete i.e.canonlytake a
single value.Thisisachievedbythe removal of repeatinggroupsintotheirownentities. Forexample,a
large Shipmentmayrequire several ‘equipments’orcontainers. Thismeanswe can have repeating
EquipmentID,SealNumberandSizeTypeCodevaluesineachcell of ourtable. FirstNormal Form says
that these shouldbe separatedintoaseparate table asshownintable 2.
ContractID Transport Mode CarrierID
PONL40078678 Sea P&ONL
TNT9439287-5 Road TNT
180-1234567 Air KE-Korean Air Cargo
Table 2 Shipment Table - 1NF
ContractID EquipmentID SizeTypeCode SealNumber SealIssuer
PONL40078678 PONL12345 4040 ABX123456 ABC
PONL40078678 PONL34567
4020 ABX123457
ABC
TNT9439287-5 PONL34567
4020 ABX123457
GGDFG99
ABC
Customs
180-1234567 KAL12345 747-K10 XXX664
Table 3 ShipmentEquipment Table - 0NF
A quickglance at the secondtable will revealthatwe have includedthe ContractIDinthe secondtable
as well asthe first. Thisis because wheneverwe move elementsintoanew table of theirownwe
include the keyvalue of the original,parenttable. We mustdothisto ensure we retainthe association
betweenthe twopiecesof data. Inrelational modelingthisiscalledthe ‘foreign’key –it’sforeign
because itshome isinthe parenttable.
Anotherlongerglance atthe secondtable will show we still have repeatingvaluesinelements
SealNumberandSealIssuer. Thisisbecause acontainermayhave several sealsattached,eachwithits
ownnumber. Therefore we needtoseparate SealNumberandSealIssuerfromthisnew table,intoa
table of theirown. But before we cando thiswe needto establishthe keyfieldsforthe new
ShipmentEquipmenttable.Onthe face of it,EquipmentIDwouldappearsufficientlyprecise tobe
unique. Infact,international shippingconventionsensure thatcontainernumbersare unique globally.
However,whilstatany particularmomentintime anEquipmentIDwouldbe unique,containersare re-
usedinothershipments. Thisisthe case here,where container“PONL34567” istakenoff a shipand
7. carriedby road. So ourkeyfor ShipmentEquipmentisboththe ContractIDandthe EquipmentID. We
thenendup withthe following….
ContractID EquipmentID SizeTypeCode
PONL40078678 PONL12345 4040
PONL40078678 PONL34567 4020
TNT9439287-5 PONL34567
4020
180-1234567 KAL12345 747-K10
Table 4 ShipmentEquipment Table - 1NF
ContractID EquipmentID SealNumber SealIssuer
PONL40078678 PONL12345 ABX123456 ABC
PONL40078678 PONL34567 ABX123457 ABC
TNT9439287-5 PONL34567 ABX123457 ABC
TNT9439287-5 PONL34567 GGDFG99 Customs
180-1234567 KAL12345 XXX664
Table 5 ShipmentSeal Table - 1NF
The newtable for ShipmentSeal hasinheritedthe foreignkeyof bothContractIDandEquipmentID.That
isto say, thispiece of equipmentwhenusedinthisshipmenthasthisseal.
Second NormalForm
The aim of secondnormal form data isto splitoff intoseparate tablesanyattributesthatdonotwholly
dependonthe entire key.
For example,whenwe lookcloselyatthe ShipmentEquipmenttable we cansee thatSizeTypeCode does
not dependentirelyonContractIDandEquipmentID(ourtwokeys).
8. We can say that the size andtype of a containerdependsonthe EquipmentID. Everycontainerhasone
EquipmentIDandone size andtype. “PONL34567” is a 40 footcontainerof standardfeatures. If the
EquipmentIDvalue changed(ie adifferentcontainerwasused),thenwe couldnotbe sure the
SizeTypeCode wouldremainthe same. SizeTypeCode isdependantonthe EquipmentID. The same
cannot be saidfor ContractID. The value of ContractIDcan change withoutaffectingthe SizeTypeCode.
For example,whenthe containeristransferredtothe truckfor road haulage – itssize andtype do not
change.
SecondNormal Formtellsusto separate these attributesthatdon’tdependonthe entire key. Inthis
case itis the SizeTypeCode anditsdependantforeignkey,EquipmentID,thatformanew Equipment
table.
ContractID EquipmentID
PONL40078678 PONL12345
PONL40078678 PONL34567
TNT9439287-5 PONL34567
180-1234567 KAL12345
Table 6 - ShipmentEquipment table - 2 NF
EquipmentID SizeTypeCode
PONL12345 4040
PONL34567 4020
KAL12345 747-K10
Table 7 Equipment table - 2 NF
Third NormalForm
To achieve adata model inThirdNormal Form we mustensure thatall Non-Keyattributesare
independentof one another. ThisissimilartoSecondNormal Form, butnow we focuson the non-key
dependencies.
9. For example,if we lookatthe ShipmentSealtable,we seethatSealNumberandSealIssuerare not
independentof eachother. Neitherare keysvalues,butthere isadependantrelationshipbetween
them,forexample if the SealIssuerwhere tochange thenthe SealNumberwouldpresumablychange as
well. Sowe mustmove SealIssueranditsdependantforeignkey,SealNumberintoanew table. Inthis
case we shall call itthe Seal table.
ContractID EquipmentID SealNumber
PONL40078678 PONL12345 ABX123456
PONL40078678 PONL34567 ABX123457
TNT9439287-5 PONL34567 ABX123457
TNT9439287-5 PONL34567 GGDFG99
180-1234567 KAL12345 XXX664
Table 8 ShipmentSeal table - 3NF
SealNumber SealIssuer
ABX123456 ABC
ABX123457 ABC
GGDFG99 Customs
XXX664
Table 9 Seal table - 3 NF
Summary
Normalizationisaformal andwell establishedmethodof analyzingdatastructures. If applied
consistently,thenthistechnique willidentifythe logical containers necessaryforbuildingre-usable XML
schemas. It can be usedinconjunctionwithotheranalysistechniquestocompare andrefine data
models.
Whilstitsoriginal purpose wastoorganize datato minimize redundancyandavoiddataduplication,itis
a powerful technique forimprovingthe understandingof the datamodelsnecessaryforre-usable
librariesof componentssuchasUBL.