SlideShare a Scribd company logo
1 of 86
Data Modeling and Databases
2ID50 – Lecture 1 – 2021-2022
Relational model and functional
dependencies
Responsible lecturer:
prof. dr. George Fletcher
Organization
Instruction groups
group 1, George Fletcher (inst) and
Stefan Popa (SA)
group 2, Stanley Clark (inst) and
Youssef Selim (SA)
group 3, Alexander Serebrenik (inst)
and Eduardo Costa Martins (SA)
group 4, Thomas Mulder (inst) and
Ibrahim Ahmed Ibrahim Elsayed Nasr (SA)
group 5, Wilco van Leeuwen (inst) and
Sam Nijsten (SA)
group 6, Daphne Miedema (inst) and
Andrei Roncea (SA)
Organization
Assessments
Mandatory homeworks, most weeks (6 in total)
• 50% of final grade, individual work
• homework is made available most weeks on
Wednesday and then must be submitted on Canvas by
following Wednesday 13:30
• Hard deadline, no late submissions
• only the best 5 (out of 6) homeworks count
Mandatory final exam
• 50% of final grade
exam score of 5.0 and total score of 6 is needed
to pass the course
Organization
Site
• http://canvas.tue.nl
• Carefully read the syllabus in full
Organization
On Canvas there are several important parts:
• Announcements: for urgent and/or important stuff.
• Modules: copies of course slides and other relevant
files will be posted here.
• A Module only becomes available when you have
completed all previous modules.
• Quizzes: the homework is posted here.
• Grades: your grades
On MS Teams, one channel per instruction group,
discussions monitored during the week by SA’s and
Instructors.
• You and your colleagues are expected to take the lead
here, working together on the weekly exercises
Organization
Schedule
Lectures, exercises
• Posted online each week on Wednesday, to be
watched offline for the instructions of the following
week
Instruction exercises
• All week: on MS Teams, in six groups
• Tuesdays, 7th and 8th hours, live plenary discussion
on MS Teams
• Thursdays, 3rd and 4th hours, live plenary
discussion on MS Teams
Instruction meetings
Planning for instruction meetings
• I will live stream here. Goal is to work on problems
together
• Please join your instruction group channel on Teams.
• All discussion and Q&A during the meeting will be
done in your channel, moderated by your instructor
• Instructors will post summarize questions to the live
stream chat, which I will monitor
Organization
Weekly schedule
• http://canvas.tue.nl
Why study Information
Systems?mation systems?
Why do we study information systems?
Example: think of the information needs of a car
dealership
• sale of new cars
• sale of used cars
• purchase of new cars
• purchase of used cars
• performing car maintenance
• controlling inventory of parts
• payment of salary of staff
• maintenance (cleaning) of company property
• etc.
Why do we study information systems?
• Information needs to be stored, used and manipulated in many
types of applications:
• scientific applications: bio- and chemical-informatics, social
network analysis, digital humanities, …
• administrative applications: banking, airline reservations
and schedules, student administration, retail (customers,
product recommendations, purchases, order tracking,
bookkeeping), manufacturing (inventory, production,
orders, supply chain), human resources (personnel
database, salaries)
• document-oriented applications: newspapers, news sites,
(digital) libraries, websites, search engines, social media
• technical applications: air traffic control, airplane control,
motor management, automotive controls, software in TV,
cameras, phones, climate control, power stations and grids,
etc.
Object System vs. Information System
• object system:
the “real world” of a company or organization or scientific
experiment or …, with people, machines, products,
warehouses, proteins, posts, likes, followers, ….
• information system:
a representation of the real world in a computer system, using
data (e.g., strings, numbers) to represent objects such as
people, machines, products, proteins, ...
• example: students are people in the real world, but in the
student administration they are represented by an identifying
student number, name, address, list of enrolled courses,
grades, etc.
• the representation is always an approximation with a purpose
• e.g., your knowledge of a course is represented by an
integer number between 0 and 10
• “The map is not the territory, the menu is not the meal.”
Modeling of Information Systems
Two major aspects:
• which information? (DATA MODELING)
• what is the structure of the data?
• what are relationships between data items?
• which constraints (restrictions) apply to the data?
• how is the information used? (PROCESS MODELING)
• when and how is information created?
• how is the information manipulated (changed)?
• how is information shared/communicated between parts of
the organization (or between the organization and external
parties)?
• We focus in this course on data
Content of this course
• Data is everywhere ...
– business data
– scientific data
– personal data
– the web, social media
– ...
... and often outlasts code
• Database management systems (DBMS) as microcosm of
computer and data science
– languages
– hardware, distributed and parallel systems
– Graphics, HCI
– Artificial intelligence, machine learning
– Systems software (e.g., file systems, memory mgmt, …)
– ...
Why do we use database systems?
• In the early days, database applications were built directly on
top of file systems.
• This leads to the following problems (and more):
• Data redundancy and inconsistency
− Multiple file formats, duplication of information across
different files
• Difficulty in accessing data
− Need to write a new program to carry out each new task
• Data isolation — from application logic
• Managing transactions and security is ad hoc,
• Integrity problems
− Integrity constraints (e.g. “account balances must be
greater than zero”) become buried in program code
rather than being stated and managed explicitly
DBMSs offer systematic principled solutions to all of these
problems
Content of this course
• data manipulation
• entering, updating, deleting and retrieving information from
a database (we concentrate on retrieval: understanding and
translating queries)
Content of this course
• data manipulation
• entering, updating, deleting and retrieving information from
a database (we concentrate on retrieval: understanding and
translating queries)
• database design
• given requirements for an information system, how do we
design a database that satisfies the user’s needs?
• how do we transform database design into an actual
database implementation (in our case, using a relational
database system)?
• in this course we run two parallel strands: database design on
Thursdays and data manipulation on Tuesdays
Levels of abstraction in data modeling
Key idea: data independence
Insulation from changes in the way data is structured and
stored
• Physical level: describes how a record (e.g., customer) is
stored (e.g., on disk)
• Logical level: describes data stored in database, and the
relationships among the data.
type customer = record
customer_id : string;
customer_name : string;
customer_street : string;
customer_city : integer;
• View level: application programs hide details of data types.
Views can also hide information (such as an employee’s salary)
for security purposes.
Instances and Schemas
• Similar to variables and types in programming languages
• Schema (data model) – logical structure of the database
• Analogous to name and type of a variable in a program
• Physical schema: database design at the physical level
• Logical schema: database design at the logical level
• Instance – the actual content of the database at a particular
point in time
• Analogous to the value of a variable
• Physical Data Independence – the ability to modify the physical
schema without changing the logical schema
• Applications depend on the logical schema
• In general, the interfaces between the various levels and
components should be well defined so that changes in
some parts do not seriously influence others.
Database Models
A database model is a collection of tools for describing:
• Data
• Data relationships
• Data semantics
• Data constraints
Examples:
• Relational model
• Entity-Relationship model (mainly for database design)
• Object-based database models (Object-oriented and Object-
relational)
• Semistructured data model (XML)
• RDF (graph) data model
• Other older models:
• Network model
• Hierarchical model
The relational data model
The Relational Database Model
A database consists of a finite set of relations (which we often call
tables)
• Each relation has a name and a set of attributes
• Each attribute has a name and a domain (or data type)
• A relation instance is a set of tuples (or rows)
Basic definitions of the relational model
Unnamed perspective: formally, given domains (i.e., sets) D1, D2,
…. Dn, a relation instance r is a finite subset of
D1  D2  …  Dn
i.e., a finite subset of the cartesian product of the domains
Thus, an instance is a finite set of n-tuples (d1, d2, …, dn) where, for
each 1<= i <= n, it holds that di Di
Note: in the unnamed perspective we need to observe a fixed order in
the tuples, whereas in a named perspective the order of attributes is
irrelevant (as long as we use the names explicitly).
Basic definitions of the relational model
Example: If
customer_name = {Jones, Smith, Curry, Lindsay, …}
customer_street = {Main, North, Park, …}
customer_city = {Harrison, Rye, Pittsfield, …}
Then r = { (Jones, Main, Harrison),
(Smith, North, Rye),
(Curry, North, Rye),
(Lindsay, Park, Pittsfield) }
is a relation instance over
customer_name  customer_street  customer_city
Relation Schema
Named Perspective: A1, A2, …, An are attributes
R = { A1, A2, …, An } is a relation schema
•Example:
Customer_schema =
{customer_name, customer_street, customer_city}
•r(R) denotes a relation instance r on the relation schema R
Example: customer (Customer_schema)
We sometimes also write the attributes instead of the schema:
customer (customer_name, customer_street, customer_city)
Relation Schema
Named Perspective: Tuple in an instance is then viewed as a
total function from the set of attributes to the (respective) set of
values
• Hence emphasizing that there really is no ordering on attributes
...
• The value of tuple t on attribute A will be denoted by
• t.A,
• t(A), or
• t[A]
Alternative presentation of an instance
We often draw a relation instance as a table, with the
attribute names above each column.
Jones
Smith
Curry
Lindsay
customer_name
Main
North
North
Park
customer_street
Harrison
Rye
Rye
Pittsfield
customer_city
customer
attributes
(or columns)
tuples
(or rows)
Keys
Let K  R
• K is a superkey of R if values for K are sufficient to
uniquely identify each tuple of each possible instance r(R)
− by “possible r” we mean a relation r that could exist in
the object system (i.e., slice of the “real world”) we are
modeling.
− Example: {customer_name, customer_street} and
{customer_name} are both superkeys of Customer, if
no two customers can possibly have the same name
• K is a candidate key of R if K is minimal
• i.e., no strict subset of K is a superkey
• e.g., {customer_name} is a candidate key of Customer
• Among the candidate keys of R we choose one to be the
principle way for identifying tuples: the primary key.
Banking example
branch (bName, bCity, assets)
customer (custName, custStreet, custCity)
account (acctNumber, bName, balance)
loan (loanNumber, bName, amount)
depositor (custName, acctNumber)
borrower (custName, loanNumber)
Foreign Keys
A relation schema may have an attribute(s) that corresponds to
the primary key of another relation: i.e., a foreign key
• E.g. customer_name and account_number attributes of
depositor are foreign keys to customer and account
respectively.
• Only values occurring in the primary key attribute of the
referenced relation may occur in the foreign key attribute of
the referencing relation.
Example: the following bank database “schema diagram”
Good logical design
Basic requirement: First Normal Form
A domain is atomic if its elements are considered to
be indivisible units
• Examples of non-atomic domains:
• Set of names
• Composite attributes
• Identification numbers that can be broken up into
parts
Basic requirement: First Normal Form
• A relational schema R is in first normal form if the
domains of all attributes of R are atomic
• Non-atomic values encourage redundant (repeated)
storage of data
• Example: Set of accounts stored with each
customer, and set of owners stored with each
account
• We assume all relations are in first normal form
Combining Schemas
 Suppose we combine borrower and loan to get
bor_loan = (customer_id, loan_number, amount )
 Result is possible repetition of information (L-100 in
example below)
A Combined Schema Without Repetition
 Consider combining loan_branch and loan
loan_amt_br = (loan_number, amount, branch_name)
 No repetition (as suggested by example below)
What About Smaller Schemas?
 Suppose we had started with bor_loan. How would we know
to split up (decompose) it into borrower and loan?
 Write a rule “if there were a schema (loan_number, amount),
then loan_number would be a candidate key for it”
 Denote as a functional dependency:
loan_number  amount
 In bor_loan, because loan_number is not a candidate key, the
amount of a loan may have to be repeated. This indicates the
need to decompose bor_loan.
What About Smaller Schemas?
 Not all decompositions are good. Suppose we decompose
employee into
employee1 = (employee_id, employee_name)
employee2 = (employee_name, telephone_number, start_date)
 The next slide shows how we lose information -- we cannot
reconstruct the original employee relation -- and so, this is a lossy
decomposition.
A Lossy Decomposition
Good logical design
• So, we typically decompose to remove redundancy
• Problems caused by redundancy:
• Wasted storage
• Update anomalies
• Insertion anomalies
• Deletion anomalies
• Good logical design accurately reflects conceptual
design (see week 3) and disallows redundancy as
best possible
Good logical design
Redundancy follows from mixing information about
more than one entity in a relation
• Reasoning about keys
• Need tools to formally reason about keys
Goal — Devise a Theory for “goodness”
• Decide whether a particular relation R (in first
normal form) is in “good” form.
• In the case that a relation R is not in “good” form,
decompose it into a set of relations {R1, R2, ..., Rn}
such that
• each relation is in good form
• the decomposition is a lossless-join decomposition
• Our theory is based on:
• functional dependencies (equality-generating
dependencies)
• multivalued dependencies (tuple-generating
dependencies)
Functional Dependencies
• Constraints on the set of legal relations.
• Require that the value for a certain set of attributes
determines uniquely the value for another set of
attributes.
• A functional dependency is a generalization of the
notion of a key.
Functional Dependencies
• Let R be a relation schema
  R and   R
• The functional dependency
  
holds on R if and only if for any legal relation r(R),
whenever any two tuples t1 and t2 of r agree on the
attributes of , they also agree on the attributes of .
That is,
t1[] = t2 []  t1[ ] = t2 [ ]
Functional Dependencies
• Let R be a relation schema
  R and   R
• The functional dependency
  
holds on R if and only if for any legal relation r(R),
whenever any two tuples t1 and t2 of r agree on the
attributes of , they also agree on the attributes of .
That is,
t1[] = t2 []  t1[ ] = t2 [ ]
• Example: Consider R(A,B ) with instance r:
• On this instance, A  B does not hold, but B  A does
1 4
1 5
3 7
A B
Functional Dependencies
• K is a superkey for schema R if and only if K  R
• K is a candidate key for R if and only if
• K  R, and
• for no   K (i.e., strict subset of K),   R holds
Functional Dependencies
• Functional dependencies allow us to express
constraints that cannot be expressed using
superkeys.
• Consider the schema:
bor_loan = (customer_id, loan_number, amount ).
We expect the following to hold:
loan_number  amount
but not:
amount  customer_name, or
loan_number  customer_name
Use of Functional Dependencies
We use functional dependencies to:
• test relations to see if they are legal under a given
set of functional dependencies.
• If a relation r is legal under a set F of functional
dependencies, we say that r satisfies F.
• specify constraints on the set of legal relations
• We say that F holds on R if all legal relations on R
satisfy the set of functional dependencies F.
Use of Functional Dependencies
Note: A specific instance of a relation schema may
satisfy a functional dependency even if the functional
dependency does not hold on all legal instances.
• For example, a specific instance of loan may, by
chance, satisfy
amount  customer_name.
Functional Dependencies
• We study the theory of fds because we need it in our
study of good logical design
• A functional dependency is trivial if it is satisfied by
all instances of a relation
• Example:
customer_name, loan_number  customer_name
customer_name  customer_name
• In general,    is trivial if    (informal proof)
Closure of a Set of Functional Dependencies
 Given a set F of functional dependencies, there are
certain other functional dependencies that are
logically implied by F.
• For example: If A  B and B  C, then we can
infer that A  C
• The set of all functional dependencies logically
implied by F is the closure of F.
• We denote the closure of F by F+.
• F+ is a superset of F.
Closure of a Set of Functional Dependencies
We can find all of F+ by applying Armstrong’s rules:
(a1) if   , then    (reflexivity)
(a2) if   , then      (augmentation)
(a3) if   , and   , then    (transitivity)
Closure of a Set of Functional Dependencies
We can find all of F+ by applying Armstrong’s rules:
(a1) if   , then    (reflexivity)
(a2) if   , then      (augmentation)
(a3) if   , and   , then    (transitivity)
These rules are
• sound (they generate only functional dependencies
that actually hold),
• complete (they generate all functional dependencies
that hold), and
• non redundant (if we take out one of the rules we
cannot generate all functional dependencies that hold
anymore).
Example correctness proofs: reflexivity
We prove:
(a1) if   , then    (reflexivity)
by using the definition of functional dependency
Example correctness proofs: reflexivity
We prove:
(a1) if   , then    (reflexivity)
by using the definition of functional dependency
Let  = { A1, ..., An, B1, ..., Bm } and  = { B1, ..., Bm }
( so we know that    )
r(R) legal relation (instance) for the scheme R
t1, t2  r : t1[] = t2 []
 ( t1[A1] = t2 [A1]  ...  t1[An] = t2 [An]
 t1[B1] = t2 [B1]  ...  t1[Bm] = t2 [Bm] )
 (t1[B1] = t2 [B1]  ...  t1[Bm] = t2 [Bm] )
 t1[ ] = t2 [ ]
so t1, t2  r: t1[] = t2 []  t1[ ] = t2 [ ] which is   
Example correctness proofs: augmentation
We prove:
(a2) if   , then      (augmentation)
by using the definition of functional dependency
Example correctness proofs: augmentation
We prove:
(a2) if   , then      (augmentation)
by using the definition of functional dependency
Because of    we know:
r(R) legal relation (instance) for the scheme R
t1, t2  r: t1[] = t2 []  t1[ ] = t2 [ ]
From logic (tautology) we know that for these t1, t2
(and in fact for any t1, t2 ) : t1[] = t2 []  t1[] = t2 []
Hence (logic: if A  B and C  D then (A  C)  (B  D) )
t1, t2  r: t1[] = t2 []  t1[] = t2 []  t1[ ] = t2 [ ]  t1[ ] = t2 [ ]
 t1, t2  r: t1[] = t2 []  t1[ ] = t2 [ ]
which is (the definition of)     
Closure of a Set of Functional Dependencies
We study the theory of fds because we need it for our
study of logical design
We can find all of F+ by applying Armstrong’s rules:
(a1) if   , then    (reflexivity)
(a2) if   , then      (augmentation)
(a3) if   , and   , then    (transitivity)
For home: prove soundness of (a3)
We establish completeness of Armstrong’s Rules in a
later lecture
Non-Redundancy of Armstrong Rules
• We have to show that if we remove a rule we
cannot always compute F+ anymore.
(One counter-example is enough!)
Non-Redundancy of Armstrong Rules
• We have to show that if we remove a rule we
cannot always compute F+ anymore.
(One counter-example is enough!)
• a1 is needed: consider R=Ø and F=Ø. Then:
F+ = {ØØ} but F+
{a2,a3} = Ø.
Non-Redundancy of Armstrong Rules
• We have to show that if we remove a rule we
cannot always compute F+ anymore.
(One counter-example is enough!)
• a1 is needed: consider R=Ø and F=Ø. Then:
F+ = {ØØ} but F+
{a2,a3} = Ø.
• a2 is needed: consider R={A,B} and F={ØA}.
Then:
F+
{a1,a3} ={ØØ, AØ, AA, BØ, BB, ABØ,
ABA, ABB, ABAB, ØA}
Non-Redundancy of Armstrong Rules
• We have to show that if we remove a rule we
cannot always compute F+ anymore.
(One counter-example is enough!)
• a1 is needed: consider R=Ø and F=Ø. Then:
F+ = {ØØ} but F+
{a2,a3} = Ø.
• a2 is needed: consider R={A,B} and F={ØA}.
Then:
F+
{a1,a3} ={ØØ, AØ, AA, BØ, BB, ABØ,
ABA, ABB, ABAB, ØA}
but
BAB is in F+ (how?)
Prove a3 at home.
Example
R = (A, B, C, G, H, I)
F = { A  B, A  C, CG  H, CG  I, B  H }
some members of F+
A  H
by transitivity from A  B and B  H
AG  I
by augmenting A  C with G, to get AG  CG
and then transitivity with CG  I
CG  HI
by augmenting CG  I with CG to infer CG  CGI,
and augmenting of CG  H with I to infer CGI  HI,
and then transitivity
Example
R = (A, B, C, D)
F = { A  C, B  D}
Claim: ACD  C
Prove this using Armstrong’s rules
Claim: AB  ABCD
Prove this using Armstrong’s rules
Algorithm for Computing F+
F+ := F
repeat
for each fd f in F+
apply reflexivity and augmentation rules on f
add the resulting fds to F+
for each pair of fds f1and f2 in F +
if f1 and f2 can be combined using transitivity
then add the resulting fd to F +
until F + does not change any further
NOTE: We shall see an alternative procedure for this task later
Additional inference rules
We can further simplify manual computation of F+ by
using the following additional rules.
(a4) If    holds and    holds, then     holds
(union)
(a5) If     holds, then    holds and    holds
(decomposition)
(a6) If    holds and     holds, then     holds
(pseudotransitivity)
The above rules can be inferred from Armstrong’s axioms.
Closure of Attribute Sets
Given a set of attributes  define the closure of 
under F (denoted by +) as the set of attributes that
are functionally determined by  under F
Algorithm to compute +, the closure of  under F
result := ;
while (changes to result) do
for each    in F do
begin
if   result then result := result  
end
Example of Attribute Set Closure
R = (A, B, C, G, H, I)
F = {A  B, A  C, CG  H, CG  I, B  H}
Example of Attribute Set Closure
R = (A, B, C, G, H, I)
F = {A  B, A  C, CG  H, CG  I, B  H}
(AG)+
1. result = AG
2. result = ABCG (A  C and A  B)
3. result = ABCGH (CG  H and CG  AGBC)
4. result = ABCGHI (CG  I and CG  AGBCH)
Example of Attribute Set Closure
R = (A, B, C, G, H, I)
F = {A  B, A  C, CG  H, CG  I, B  H}
(AG)+
1. result = AG
2. result = ABCG (A  C and A  B)
3. result = ABCGH (CG  H and CG  AGBC)
4. result = ABCGHI (CG  I and CG  AGBCH)
Is AG a candidate key?
1. Is AG a super key?
2. Is any subset of AG a superkey?
Uses of Attribute Closure
There are several uses of the attribute closure algorithm:
• Testing for superkey:
– To test if  is a superkey, we compute + and check
if + contains all attributes of R
• Testing functional dependencies
– To check if a functional dependency    holds (or,
in other words, is in F+), just check if   +.
– That is, we compute + by using attribute closure, and
then check if it contains .
– Is a simple and cheap test, and very useful
• Computing closure of F
– For each   R, we find the closure +, and for each
S  +, we output a functional dependency   S.
Canonical Cover
• Sets of functional dependencies may have
redundant dependencies that can be inferred from
the others
For example: A  C is redundant in:
{A  B, B  C, A  C}
Parts of a functional dependency may be redundant
E.g.: on RHS: {A  B, B  C, A  CD} can be
simplified to {A  B, B  C, A  D}
E.g.: on LHS: {A  B, B  C, AC  D} can be
simplified to {A  B, B  C, A  D}
• Intuitively, a canonical cover of F is a “minimal” set
of functional dependencies equivalent to F, having
no redundant dependencies or redundant parts of
dependencies
Extraneous Attributes
Consider a set F of functional dependencies and the
functional dependency    in F.
• Attribute A is extraneous in  if A  
and F logically implies (F – {  })  {( – A)  },
i.e., if ( – A) closed under F contains .
Extraneous Attributes
Consider a set F of functional dependencies and the
functional dependency    in F.
• Attribute A is extraneous in  if A  
and F logically implies (F – {  })  {( – A)  },
i.e., if ( – A) closed under F contains .
• Attribute B is extraneous in  if B  
and the set of functional dependencies
(F – {  })  { ( – B)} logically implies F,
i.e., if   B follows from this set.
Extraneous Attributes
Example: Given F = {A  C, AB  C }
B is extraneous in AB  C because {A  C, AB  C}
logically implies A  C.
Extraneous Attributes
Example: Given F = {A  C, AB  C }
B is extraneous in AB  C because {A  C, AB  C}
logically implies A  C.
Example: Given F = {A  C, AB  CD}
C is extraneous in AB  CD since AB  C can be
inferred even after deleting C from AB  CD
Testing if an Attribute is Extraneous
 Consider a set F of functional dependencies and the
functional dependency    in F.
 To test if attribute A   is extraneous in 
 compute ({} – A)+ using the dependencies in F
 check that ({} – A)+ contains ; if it does, A is
extraneous in 
 To test if attribute B   is extraneous in 
 compute + using only the dependencies in
F’ = (F – {  })  { ( – B)},
 check that + contains B; if it does, B is extraneous in

Canonical Cover
A canonical cover for F is a set of dependencies Fc such
that
• F logically implies all dependencies in Fc,
• Fc logically implies all dependencies in F,
• No functional dependency in Fc contains an extraneous
attribute, and
• Each left side of functional dependency in Fc is unique.
To compute a canonical cover for F:
repeat
Use the union rule to replace any dependencies in F
1  1 and 1  2 with 1  1 2
Find a functional dependency    with an
extraneous attribute either in  or in 
If an extraneous attribute is found, delete it from   
until F does not change
Computing a Canonical Cover
R = (A, B, C)
F = { A  BC, B  C, A  B, AB  C }
Computing a Canonical Cover
R = (A, B, C)
F = { A  BC, B  C, A  B, AB  C }
Combine A  BC and A  B into A  BC
Set is now { A  BC, B  C, AB  C }
Computing a Canonical Cover
R = (A, B, C)
F = { A  BC, B  C, A  B, AB  C }
Combine A  BC and A  B into A  BC
Set is now { A  BC, B  C, AB  C }
A is extraneous in AB  C
Check if the result of deleting A from AB  C is implied by the
other dependencies
Yes: in fact, B  C is already present!
Set is now { A  BC, B  C }
Computing a Canonical Cover
R = (A, B, C)
F = { A  BC, B  C, A  B, AB  C }
Combine A  BC and A  B into A  BC
Set is now { A  BC, B  C, AB  C }
A is extraneous in AB  C
Check if the result of deleting A from AB  C is implied by the
other dependencies
Yes: in fact, B  C is already present!
Set is now { A  BC, B  C }
C is extraneous in A  BC
Check if A  C is logically implied by A  B and the other
dependencies
Yes: using transitivity on A  B and B  C.
Can use attribute closure of A in more complex cases
Computing a Canonical Cover
R = (A, B, C)
F = { A  BC, B  C, A  B, AB  C }
Combine A  BC and A  B into A  BC
Set is now { A  BC, B  C, AB  C }
A is extraneous in AB  C
Check if the result of deleting A from AB  C is implied by the
other dependencies
Yes: in fact, B  C is already present!
Set is now { A  BC, B  C }
C is extraneous in A  BC
Check if A  C is logically implied by A  B and the other
dependencies
Yes: using transitivity on A  B and B  C.
Can use attribute closure of A in more complex cases
The canonical cover is: { A  B, B  C }
Recap
Today:
Relational data model
Keys
Functional dependencies
Armstrong’s rules
Closure of a set of fds
Closure of a set of attributes
Canonical covers
Next lecture:
Data manipulation with the relational algebra (part 1)
Exercise
Consider the following proposed rule for functional
dependencies:
Rule: If α → β and γ → β then α → γ.
Prove that this rule is not sound by showing a relation
r that satisfies α → β and γ → β but does not satisfy
α→ γ.
Exercise
Consider:
R = (A, B, C, D)
F = { B  ABD, A  D, D  A, B  AD }.
Compute Fc.
Exercise
Given the database schema R(a,b,c), and a relation r
on the schema R, write a RA query to test whether
the functional dependency b → c holds on relation r.
• First, let’s agree to let an empty relation denote FALSE
and a non-empty relation denote TRUE.
• Second, recall that we can project on an empty list of
attributes Ø. In this case, there are only two possible
values for  Ø(R) for any instance of R, namely, {} and
{()}.

More Related Content

Similar to week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt

Info systems databases
Info systems databasesInfo systems databases
Info systems databasesMR Z
 
01-Database Administration and Management.pdf
01-Database Administration and Management.pdf01-Database Administration and Management.pdf
01-Database Administration and Management.pdfTOUSEEQHAIDER14
 
Cdocumentsandsettingsuser1desktop2 dbmsexamples-091012013049-phpapp01
Cdocumentsandsettingsuser1desktop2 dbmsexamples-091012013049-phpapp01Cdocumentsandsettingsuser1desktop2 dbmsexamples-091012013049-phpapp01
Cdocumentsandsettingsuser1desktop2 dbmsexamples-091012013049-phpapp01Raza Baloch
 
dbms Unit 1.pdf arey bhai teri maa chodunga
dbms Unit 1.pdf arey bhai teri maa chodungadbms Unit 1.pdf arey bhai teri maa chodunga
dbms Unit 1.pdf arey bhai teri maa chodungaVaradKadtan1
 
Database Management Systems (DBMS) are software systems used to store, retrie...
Database Management Systems (DBMS) are software systems used to store, retrie...Database Management Systems (DBMS) are software systems used to store, retrie...
Database Management Systems (DBMS) are software systems used to store, retrie...mayurjagdale4
 
Database :Introduction to Database System
Database :Introduction to Database SystemDatabase :Introduction to Database System
Database :Introduction to Database SystemZakriyaMalik2
 
Database.ppt
Database.pptDatabase.ppt
Database.pptFaimHasan
 
Chapter 7. Databases Chapter In Introduction to Computer. Pptx
Chapter 7. Databases Chapter In Introduction to Computer. PptxChapter 7. Databases Chapter In Introduction to Computer. Pptx
Chapter 7. Databases Chapter In Introduction to Computer. PptxMohsinChaudhary17
 
introduction-to-dbms-unit-1.ppt
introduction-to-dbms-unit-1.pptintroduction-to-dbms-unit-1.ppt
introduction-to-dbms-unit-1.pptrekhasai2468
 
Chapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptxChapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptxWollo UNiversity
 

Similar to week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt (20)

Info systems databases
Info systems databasesInfo systems databases
Info systems databases
 
DBMS
DBMS DBMS
DBMS
 
Unit 2 DATABASE ESSENTIALS.pptx
Unit 2 DATABASE ESSENTIALS.pptxUnit 2 DATABASE ESSENTIALS.pptx
Unit 2 DATABASE ESSENTIALS.pptx
 
Lecture 0 INT306.pptx
Lecture 0 INT306.pptxLecture 0 INT306.pptx
Lecture 0 INT306.pptx
 
01-Database Administration and Management.pdf
01-Database Administration and Management.pdf01-Database Administration and Management.pdf
01-Database Administration and Management.pdf
 
DMDW Unit 1.pdf
DMDW Unit 1.pdfDMDW Unit 1.pdf
DMDW Unit 1.pdf
 
DBMS an Example
DBMS an ExampleDBMS an Example
DBMS an Example
 
Cdocumentsandsettingsuser1desktop2 dbmsexamples-091012013049-phpapp01
Cdocumentsandsettingsuser1desktop2 dbmsexamples-091012013049-phpapp01Cdocumentsandsettingsuser1desktop2 dbmsexamples-091012013049-phpapp01
Cdocumentsandsettingsuser1desktop2 dbmsexamples-091012013049-phpapp01
 
dbms Unit 1.pdf arey bhai teri maa chodunga
dbms Unit 1.pdf arey bhai teri maa chodungadbms Unit 1.pdf arey bhai teri maa chodunga
dbms Unit 1.pdf arey bhai teri maa chodunga
 
Database part1-
Database part1-Database part1-
Database part1-
 
Database Management Systems (DBMS) are software systems used to store, retrie...
Database Management Systems (DBMS) are software systems used to store, retrie...Database Management Systems (DBMS) are software systems used to store, retrie...
Database Management Systems (DBMS) are software systems used to store, retrie...
 
DISE - Database Concepts
DISE - Database ConceptsDISE - Database Concepts
DISE - Database Concepts
 
Data dictionary
Data dictionaryData dictionary
Data dictionary
 
Database :Introduction to Database System
Database :Introduction to Database SystemDatabase :Introduction to Database System
Database :Introduction to Database System
 
Database.ppt
Database.pptDatabase.ppt
Database.ppt
 
Chapter 7. Databases Chapter In Introduction to Computer. Pptx
Chapter 7. Databases Chapter In Introduction to Computer. PptxChapter 7. Databases Chapter In Introduction to Computer. Pptx
Chapter 7. Databases Chapter In Introduction to Computer. Pptx
 
Database Lecture Notes
Database Lecture NotesDatabase Lecture Notes
Database Lecture Notes
 
introduction-to-dbms-unit-1.ppt
introduction-to-dbms-unit-1.pptintroduction-to-dbms-unit-1.ppt
introduction-to-dbms-unit-1.ppt
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Chapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptxChapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptx
 

Recently uploaded

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 

Recently uploaded (20)

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 

week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt

  • 1. Data Modeling and Databases 2ID50 – Lecture 1 – 2021-2022 Relational model and functional dependencies Responsible lecturer: prof. dr. George Fletcher
  • 2. Organization Instruction groups group 1, George Fletcher (inst) and Stefan Popa (SA) group 2, Stanley Clark (inst) and Youssef Selim (SA) group 3, Alexander Serebrenik (inst) and Eduardo Costa Martins (SA) group 4, Thomas Mulder (inst) and Ibrahim Ahmed Ibrahim Elsayed Nasr (SA) group 5, Wilco van Leeuwen (inst) and Sam Nijsten (SA) group 6, Daphne Miedema (inst) and Andrei Roncea (SA)
  • 3. Organization Assessments Mandatory homeworks, most weeks (6 in total) • 50% of final grade, individual work • homework is made available most weeks on Wednesday and then must be submitted on Canvas by following Wednesday 13:30 • Hard deadline, no late submissions • only the best 5 (out of 6) homeworks count Mandatory final exam • 50% of final grade exam score of 5.0 and total score of 6 is needed to pass the course
  • 5. Organization On Canvas there are several important parts: • Announcements: for urgent and/or important stuff. • Modules: copies of course slides and other relevant files will be posted here. • A Module only becomes available when you have completed all previous modules. • Quizzes: the homework is posted here. • Grades: your grades On MS Teams, one channel per instruction group, discussions monitored during the week by SA’s and Instructors. • You and your colleagues are expected to take the lead here, working together on the weekly exercises
  • 6. Organization Schedule Lectures, exercises • Posted online each week on Wednesday, to be watched offline for the instructions of the following week Instruction exercises • All week: on MS Teams, in six groups • Tuesdays, 7th and 8th hours, live plenary discussion on MS Teams • Thursdays, 3rd and 4th hours, live plenary discussion on MS Teams
  • 7. Instruction meetings Planning for instruction meetings • I will live stream here. Goal is to work on problems together • Please join your instruction group channel on Teams. • All discussion and Q&A during the meeting will be done in your channel, moderated by your instructor • Instructors will post summarize questions to the live stream chat, which I will monitor
  • 10. Why do we study information systems? Example: think of the information needs of a car dealership • sale of new cars • sale of used cars • purchase of new cars • purchase of used cars • performing car maintenance • controlling inventory of parts • payment of salary of staff • maintenance (cleaning) of company property • etc.
  • 11. Why do we study information systems? • Information needs to be stored, used and manipulated in many types of applications: • scientific applications: bio- and chemical-informatics, social network analysis, digital humanities, … • administrative applications: banking, airline reservations and schedules, student administration, retail (customers, product recommendations, purchases, order tracking, bookkeeping), manufacturing (inventory, production, orders, supply chain), human resources (personnel database, salaries) • document-oriented applications: newspapers, news sites, (digital) libraries, websites, search engines, social media • technical applications: air traffic control, airplane control, motor management, automotive controls, software in TV, cameras, phones, climate control, power stations and grids, etc.
  • 12. Object System vs. Information System • object system: the “real world” of a company or organization or scientific experiment or …, with people, machines, products, warehouses, proteins, posts, likes, followers, …. • information system: a representation of the real world in a computer system, using data (e.g., strings, numbers) to represent objects such as people, machines, products, proteins, ... • example: students are people in the real world, but in the student administration they are represented by an identifying student number, name, address, list of enrolled courses, grades, etc. • the representation is always an approximation with a purpose • e.g., your knowledge of a course is represented by an integer number between 0 and 10 • “The map is not the territory, the menu is not the meal.”
  • 13. Modeling of Information Systems Two major aspects: • which information? (DATA MODELING) • what is the structure of the data? • what are relationships between data items? • which constraints (restrictions) apply to the data? • how is the information used? (PROCESS MODELING) • when and how is information created? • how is the information manipulated (changed)? • how is information shared/communicated between parts of the organization (or between the organization and external parties)? • We focus in this course on data
  • 14. Content of this course • Data is everywhere ... – business data – scientific data – personal data – the web, social media – ... ... and often outlasts code • Database management systems (DBMS) as microcosm of computer and data science – languages – hardware, distributed and parallel systems – Graphics, HCI – Artificial intelligence, machine learning – Systems software (e.g., file systems, memory mgmt, …) – ...
  • 15. Why do we use database systems? • In the early days, database applications were built directly on top of file systems. • This leads to the following problems (and more): • Data redundancy and inconsistency − Multiple file formats, duplication of information across different files • Difficulty in accessing data − Need to write a new program to carry out each new task • Data isolation — from application logic • Managing transactions and security is ad hoc, • Integrity problems − Integrity constraints (e.g. “account balances must be greater than zero”) become buried in program code rather than being stated and managed explicitly DBMSs offer systematic principled solutions to all of these problems
  • 16. Content of this course • data manipulation • entering, updating, deleting and retrieving information from a database (we concentrate on retrieval: understanding and translating queries)
  • 17. Content of this course • data manipulation • entering, updating, deleting and retrieving information from a database (we concentrate on retrieval: understanding and translating queries) • database design • given requirements for an information system, how do we design a database that satisfies the user’s needs? • how do we transform database design into an actual database implementation (in our case, using a relational database system)? • in this course we run two parallel strands: database design on Thursdays and data manipulation on Tuesdays
  • 18. Levels of abstraction in data modeling Key idea: data independence Insulation from changes in the way data is structured and stored • Physical level: describes how a record (e.g., customer) is stored (e.g., on disk) • Logical level: describes data stored in database, and the relationships among the data. type customer = record customer_id : string; customer_name : string; customer_street : string; customer_city : integer; • View level: application programs hide details of data types. Views can also hide information (such as an employee’s salary) for security purposes.
  • 19. Instances and Schemas • Similar to variables and types in programming languages • Schema (data model) – logical structure of the database • Analogous to name and type of a variable in a program • Physical schema: database design at the physical level • Logical schema: database design at the logical level • Instance – the actual content of the database at a particular point in time • Analogous to the value of a variable • Physical Data Independence – the ability to modify the physical schema without changing the logical schema • Applications depend on the logical schema • In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others.
  • 20. Database Models A database model is a collection of tools for describing: • Data • Data relationships • Data semantics • Data constraints Examples: • Relational model • Entity-Relationship model (mainly for database design) • Object-based database models (Object-oriented and Object- relational) • Semistructured data model (XML) • RDF (graph) data model • Other older models: • Network model • Hierarchical model
  • 22. The Relational Database Model A database consists of a finite set of relations (which we often call tables) • Each relation has a name and a set of attributes • Each attribute has a name and a domain (or data type) • A relation instance is a set of tuples (or rows)
  • 23. Basic definitions of the relational model Unnamed perspective: formally, given domains (i.e., sets) D1, D2, …. Dn, a relation instance r is a finite subset of D1  D2  …  Dn i.e., a finite subset of the cartesian product of the domains Thus, an instance is a finite set of n-tuples (d1, d2, …, dn) where, for each 1<= i <= n, it holds that di Di Note: in the unnamed perspective we need to observe a fixed order in the tuples, whereas in a named perspective the order of attributes is irrelevant (as long as we use the names explicitly).
  • 24. Basic definitions of the relational model Example: If customer_name = {Jones, Smith, Curry, Lindsay, …} customer_street = {Main, North, Park, …} customer_city = {Harrison, Rye, Pittsfield, …} Then r = { (Jones, Main, Harrison), (Smith, North, Rye), (Curry, North, Rye), (Lindsay, Park, Pittsfield) } is a relation instance over customer_name  customer_street  customer_city
  • 25. Relation Schema Named Perspective: A1, A2, …, An are attributes R = { A1, A2, …, An } is a relation schema •Example: Customer_schema = {customer_name, customer_street, customer_city} •r(R) denotes a relation instance r on the relation schema R Example: customer (Customer_schema) We sometimes also write the attributes instead of the schema: customer (customer_name, customer_street, customer_city)
  • 26. Relation Schema Named Perspective: Tuple in an instance is then viewed as a total function from the set of attributes to the (respective) set of values • Hence emphasizing that there really is no ordering on attributes ... • The value of tuple t on attribute A will be denoted by • t.A, • t(A), or • t[A]
  • 27. Alternative presentation of an instance We often draw a relation instance as a table, with the attribute names above each column. Jones Smith Curry Lindsay customer_name Main North North Park customer_street Harrison Rye Rye Pittsfield customer_city customer attributes (or columns) tuples (or rows)
  • 28. Keys Let K  R • K is a superkey of R if values for K are sufficient to uniquely identify each tuple of each possible instance r(R) − by “possible r” we mean a relation r that could exist in the object system (i.e., slice of the “real world”) we are modeling. − Example: {customer_name, customer_street} and {customer_name} are both superkeys of Customer, if no two customers can possibly have the same name • K is a candidate key of R if K is minimal • i.e., no strict subset of K is a superkey • e.g., {customer_name} is a candidate key of Customer • Among the candidate keys of R we choose one to be the principle way for identifying tuples: the primary key.
  • 29. Banking example branch (bName, bCity, assets) customer (custName, custStreet, custCity) account (acctNumber, bName, balance) loan (loanNumber, bName, amount) depositor (custName, acctNumber) borrower (custName, loanNumber)
  • 30. Foreign Keys A relation schema may have an attribute(s) that corresponds to the primary key of another relation: i.e., a foreign key • E.g. customer_name and account_number attributes of depositor are foreign keys to customer and account respectively. • Only values occurring in the primary key attribute of the referenced relation may occur in the foreign key attribute of the referencing relation. Example: the following bank database “schema diagram”
  • 32. Basic requirement: First Normal Form A domain is atomic if its elements are considered to be indivisible units • Examples of non-atomic domains: • Set of names • Composite attributes • Identification numbers that can be broken up into parts
  • 33. Basic requirement: First Normal Form • A relational schema R is in first normal form if the domains of all attributes of R are atomic • Non-atomic values encourage redundant (repeated) storage of data • Example: Set of accounts stored with each customer, and set of owners stored with each account • We assume all relations are in first normal form
  • 34. Combining Schemas  Suppose we combine borrower and loan to get bor_loan = (customer_id, loan_number, amount )  Result is possible repetition of information (L-100 in example below)
  • 35. A Combined Schema Without Repetition  Consider combining loan_branch and loan loan_amt_br = (loan_number, amount, branch_name)  No repetition (as suggested by example below)
  • 36. What About Smaller Schemas?  Suppose we had started with bor_loan. How would we know to split up (decompose) it into borrower and loan?  Write a rule “if there were a schema (loan_number, amount), then loan_number would be a candidate key for it”  Denote as a functional dependency: loan_number  amount  In bor_loan, because loan_number is not a candidate key, the amount of a loan may have to be repeated. This indicates the need to decompose bor_loan.
  • 37. What About Smaller Schemas?  Not all decompositions are good. Suppose we decompose employee into employee1 = (employee_id, employee_name) employee2 = (employee_name, telephone_number, start_date)  The next slide shows how we lose information -- we cannot reconstruct the original employee relation -- and so, this is a lossy decomposition.
  • 39. Good logical design • So, we typically decompose to remove redundancy • Problems caused by redundancy: • Wasted storage • Update anomalies • Insertion anomalies • Deletion anomalies • Good logical design accurately reflects conceptual design (see week 3) and disallows redundancy as best possible
  • 40. Good logical design Redundancy follows from mixing information about more than one entity in a relation • Reasoning about keys • Need tools to formally reason about keys
  • 41. Goal — Devise a Theory for “goodness” • Decide whether a particular relation R (in first normal form) is in “good” form. • In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that • each relation is in good form • the decomposition is a lossless-join decomposition • Our theory is based on: • functional dependencies (equality-generating dependencies) • multivalued dependencies (tuple-generating dependencies)
  • 42. Functional Dependencies • Constraints on the set of legal relations. • Require that the value for a certain set of attributes determines uniquely the value for another set of attributes. • A functional dependency is a generalization of the notion of a key.
  • 43. Functional Dependencies • Let R be a relation schema   R and   R • The functional dependency    holds on R if and only if for any legal relation r(R), whenever any two tuples t1 and t2 of r agree on the attributes of , they also agree on the attributes of . That is, t1[] = t2 []  t1[ ] = t2 [ ]
  • 44. Functional Dependencies • Let R be a relation schema   R and   R • The functional dependency    holds on R if and only if for any legal relation r(R), whenever any two tuples t1 and t2 of r agree on the attributes of , they also agree on the attributes of . That is, t1[] = t2 []  t1[ ] = t2 [ ] • Example: Consider R(A,B ) with instance r: • On this instance, A  B does not hold, but B  A does 1 4 1 5 3 7 A B
  • 45. Functional Dependencies • K is a superkey for schema R if and only if K  R • K is a candidate key for R if and only if • K  R, and • for no   K (i.e., strict subset of K),   R holds
  • 46. Functional Dependencies • Functional dependencies allow us to express constraints that cannot be expressed using superkeys. • Consider the schema: bor_loan = (customer_id, loan_number, amount ). We expect the following to hold: loan_number  amount but not: amount  customer_name, or loan_number  customer_name
  • 47. Use of Functional Dependencies We use functional dependencies to: • test relations to see if they are legal under a given set of functional dependencies. • If a relation r is legal under a set F of functional dependencies, we say that r satisfies F. • specify constraints on the set of legal relations • We say that F holds on R if all legal relations on R satisfy the set of functional dependencies F.
  • 48. Use of Functional Dependencies Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances. • For example, a specific instance of loan may, by chance, satisfy amount  customer_name.
  • 49. Functional Dependencies • We study the theory of fds because we need it in our study of good logical design • A functional dependency is trivial if it is satisfied by all instances of a relation • Example: customer_name, loan_number  customer_name customer_name  customer_name • In general,    is trivial if    (informal proof)
  • 50. Closure of a Set of Functional Dependencies  Given a set F of functional dependencies, there are certain other functional dependencies that are logically implied by F. • For example: If A  B and B  C, then we can infer that A  C • The set of all functional dependencies logically implied by F is the closure of F. • We denote the closure of F by F+. • F+ is a superset of F.
  • 51. Closure of a Set of Functional Dependencies We can find all of F+ by applying Armstrong’s rules: (a1) if   , then    (reflexivity) (a2) if   , then      (augmentation) (a3) if   , and   , then    (transitivity)
  • 52. Closure of a Set of Functional Dependencies We can find all of F+ by applying Armstrong’s rules: (a1) if   , then    (reflexivity) (a2) if   , then      (augmentation) (a3) if   , and   , then    (transitivity) These rules are • sound (they generate only functional dependencies that actually hold), • complete (they generate all functional dependencies that hold), and • non redundant (if we take out one of the rules we cannot generate all functional dependencies that hold anymore).
  • 53. Example correctness proofs: reflexivity We prove: (a1) if   , then    (reflexivity) by using the definition of functional dependency
  • 54. Example correctness proofs: reflexivity We prove: (a1) if   , then    (reflexivity) by using the definition of functional dependency Let  = { A1, ..., An, B1, ..., Bm } and  = { B1, ..., Bm } ( so we know that    ) r(R) legal relation (instance) for the scheme R t1, t2  r : t1[] = t2 []  ( t1[A1] = t2 [A1]  ...  t1[An] = t2 [An]  t1[B1] = t2 [B1]  ...  t1[Bm] = t2 [Bm] )  (t1[B1] = t2 [B1]  ...  t1[Bm] = t2 [Bm] )  t1[ ] = t2 [ ] so t1, t2  r: t1[] = t2 []  t1[ ] = t2 [ ] which is   
  • 55. Example correctness proofs: augmentation We prove: (a2) if   , then      (augmentation) by using the definition of functional dependency
  • 56. Example correctness proofs: augmentation We prove: (a2) if   , then      (augmentation) by using the definition of functional dependency Because of    we know: r(R) legal relation (instance) for the scheme R t1, t2  r: t1[] = t2 []  t1[ ] = t2 [ ] From logic (tautology) we know that for these t1, t2 (and in fact for any t1, t2 ) : t1[] = t2 []  t1[] = t2 [] Hence (logic: if A  B and C  D then (A  C)  (B  D) ) t1, t2  r: t1[] = t2 []  t1[] = t2 []  t1[ ] = t2 [ ]  t1[ ] = t2 [ ]  t1, t2  r: t1[] = t2 []  t1[ ] = t2 [ ] which is (the definition of)     
  • 57. Closure of a Set of Functional Dependencies We study the theory of fds because we need it for our study of logical design We can find all of F+ by applying Armstrong’s rules: (a1) if   , then    (reflexivity) (a2) if   , then      (augmentation) (a3) if   , and   , then    (transitivity) For home: prove soundness of (a3) We establish completeness of Armstrong’s Rules in a later lecture
  • 58. Non-Redundancy of Armstrong Rules • We have to show that if we remove a rule we cannot always compute F+ anymore. (One counter-example is enough!)
  • 59. Non-Redundancy of Armstrong Rules • We have to show that if we remove a rule we cannot always compute F+ anymore. (One counter-example is enough!) • a1 is needed: consider R=Ø and F=Ø. Then: F+ = {ØØ} but F+ {a2,a3} = Ø.
  • 60. Non-Redundancy of Armstrong Rules • We have to show that if we remove a rule we cannot always compute F+ anymore. (One counter-example is enough!) • a1 is needed: consider R=Ø and F=Ø. Then: F+ = {ØØ} but F+ {a2,a3} = Ø. • a2 is needed: consider R={A,B} and F={ØA}. Then: F+ {a1,a3} ={ØØ, AØ, AA, BØ, BB, ABØ, ABA, ABB, ABAB, ØA}
  • 61. Non-Redundancy of Armstrong Rules • We have to show that if we remove a rule we cannot always compute F+ anymore. (One counter-example is enough!) • a1 is needed: consider R=Ø and F=Ø. Then: F+ = {ØØ} but F+ {a2,a3} = Ø. • a2 is needed: consider R={A,B} and F={ØA}. Then: F+ {a1,a3} ={ØØ, AØ, AA, BØ, BB, ABØ, ABA, ABB, ABAB, ØA} but BAB is in F+ (how?) Prove a3 at home.
  • 62. Example R = (A, B, C, G, H, I) F = { A  B, A  C, CG  H, CG  I, B  H } some members of F+ A  H by transitivity from A  B and B  H AG  I by augmenting A  C with G, to get AG  CG and then transitivity with CG  I CG  HI by augmenting CG  I with CG to infer CG  CGI, and augmenting of CG  H with I to infer CGI  HI, and then transitivity
  • 63. Example R = (A, B, C, D) F = { A  C, B  D} Claim: ACD  C Prove this using Armstrong’s rules Claim: AB  ABCD Prove this using Armstrong’s rules
  • 64. Algorithm for Computing F+ F+ := F repeat for each fd f in F+ apply reflexivity and augmentation rules on f add the resulting fds to F+ for each pair of fds f1and f2 in F + if f1 and f2 can be combined using transitivity then add the resulting fd to F + until F + does not change any further NOTE: We shall see an alternative procedure for this task later
  • 65. Additional inference rules We can further simplify manual computation of F+ by using the following additional rules. (a4) If    holds and    holds, then     holds (union) (a5) If     holds, then    holds and    holds (decomposition) (a6) If    holds and     holds, then     holds (pseudotransitivity) The above rules can be inferred from Armstrong’s axioms.
  • 66. Closure of Attribute Sets Given a set of attributes  define the closure of  under F (denoted by +) as the set of attributes that are functionally determined by  under F Algorithm to compute +, the closure of  under F result := ; while (changes to result) do for each    in F do begin if   result then result := result   end
  • 67. Example of Attribute Set Closure R = (A, B, C, G, H, I) F = {A  B, A  C, CG  H, CG  I, B  H}
  • 68. Example of Attribute Set Closure R = (A, B, C, G, H, I) F = {A  B, A  C, CG  H, CG  I, B  H} (AG)+ 1. result = AG 2. result = ABCG (A  C and A  B) 3. result = ABCGH (CG  H and CG  AGBC) 4. result = ABCGHI (CG  I and CG  AGBCH)
  • 69. Example of Attribute Set Closure R = (A, B, C, G, H, I) F = {A  B, A  C, CG  H, CG  I, B  H} (AG)+ 1. result = AG 2. result = ABCG (A  C and A  B) 3. result = ABCGH (CG  H and CG  AGBC) 4. result = ABCGHI (CG  I and CG  AGBCH) Is AG a candidate key? 1. Is AG a super key? 2. Is any subset of AG a superkey?
  • 70. Uses of Attribute Closure There are several uses of the attribute closure algorithm: • Testing for superkey: – To test if  is a superkey, we compute + and check if + contains all attributes of R • Testing functional dependencies – To check if a functional dependency    holds (or, in other words, is in F+), just check if   +. – That is, we compute + by using attribute closure, and then check if it contains . – Is a simple and cheap test, and very useful • Computing closure of F – For each   R, we find the closure +, and for each S  +, we output a functional dependency   S.
  • 71. Canonical Cover • Sets of functional dependencies may have redundant dependencies that can be inferred from the others For example: A  C is redundant in: {A  B, B  C, A  C} Parts of a functional dependency may be redundant E.g.: on RHS: {A  B, B  C, A  CD} can be simplified to {A  B, B  C, A  D} E.g.: on LHS: {A  B, B  C, AC  D} can be simplified to {A  B, B  C, A  D} • Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies
  • 72. Extraneous Attributes Consider a set F of functional dependencies and the functional dependency    in F. • Attribute A is extraneous in  if A   and F logically implies (F – {  })  {( – A)  }, i.e., if ( – A) closed under F contains .
  • 73. Extraneous Attributes Consider a set F of functional dependencies and the functional dependency    in F. • Attribute A is extraneous in  if A   and F logically implies (F – {  })  {( – A)  }, i.e., if ( – A) closed under F contains . • Attribute B is extraneous in  if B   and the set of functional dependencies (F – {  })  { ( – B)} logically implies F, i.e., if   B follows from this set.
  • 74. Extraneous Attributes Example: Given F = {A  C, AB  C } B is extraneous in AB  C because {A  C, AB  C} logically implies A  C.
  • 75. Extraneous Attributes Example: Given F = {A  C, AB  C } B is extraneous in AB  C because {A  C, AB  C} logically implies A  C. Example: Given F = {A  C, AB  CD} C is extraneous in AB  CD since AB  C can be inferred even after deleting C from AB  CD
  • 76. Testing if an Attribute is Extraneous  Consider a set F of functional dependencies and the functional dependency    in F.  To test if attribute A   is extraneous in   compute ({} – A)+ using the dependencies in F  check that ({} – A)+ contains ; if it does, A is extraneous in   To test if attribute B   is extraneous in   compute + using only the dependencies in F’ = (F – {  })  { ( – B)},  check that + contains B; if it does, B is extraneous in 
  • 77. Canonical Cover A canonical cover for F is a set of dependencies Fc such that • F logically implies all dependencies in Fc, • Fc logically implies all dependencies in F, • No functional dependency in Fc contains an extraneous attribute, and • Each left side of functional dependency in Fc is unique. To compute a canonical cover for F: repeat Use the union rule to replace any dependencies in F 1  1 and 1  2 with 1  1 2 Find a functional dependency    with an extraneous attribute either in  or in  If an extraneous attribute is found, delete it from    until F does not change
  • 78. Computing a Canonical Cover R = (A, B, C) F = { A  BC, B  C, A  B, AB  C }
  • 79. Computing a Canonical Cover R = (A, B, C) F = { A  BC, B  C, A  B, AB  C } Combine A  BC and A  B into A  BC Set is now { A  BC, B  C, AB  C }
  • 80. Computing a Canonical Cover R = (A, B, C) F = { A  BC, B  C, A  B, AB  C } Combine A  BC and A  B into A  BC Set is now { A  BC, B  C, AB  C } A is extraneous in AB  C Check if the result of deleting A from AB  C is implied by the other dependencies Yes: in fact, B  C is already present! Set is now { A  BC, B  C }
  • 81. Computing a Canonical Cover R = (A, B, C) F = { A  BC, B  C, A  B, AB  C } Combine A  BC and A  B into A  BC Set is now { A  BC, B  C, AB  C } A is extraneous in AB  C Check if the result of deleting A from AB  C is implied by the other dependencies Yes: in fact, B  C is already present! Set is now { A  BC, B  C } C is extraneous in A  BC Check if A  C is logically implied by A  B and the other dependencies Yes: using transitivity on A  B and B  C. Can use attribute closure of A in more complex cases
  • 82. Computing a Canonical Cover R = (A, B, C) F = { A  BC, B  C, A  B, AB  C } Combine A  BC and A  B into A  BC Set is now { A  BC, B  C, AB  C } A is extraneous in AB  C Check if the result of deleting A from AB  C is implied by the other dependencies Yes: in fact, B  C is already present! Set is now { A  BC, B  C } C is extraneous in A  BC Check if A  C is logically implied by A  B and the other dependencies Yes: using transitivity on A  B and B  C. Can use attribute closure of A in more complex cases The canonical cover is: { A  B, B  C }
  • 83. Recap Today: Relational data model Keys Functional dependencies Armstrong’s rules Closure of a set of fds Closure of a set of attributes Canonical covers Next lecture: Data manipulation with the relational algebra (part 1)
  • 84. Exercise Consider the following proposed rule for functional dependencies: Rule: If α → β and γ → β then α → γ. Prove that this rule is not sound by showing a relation r that satisfies α → β and γ → β but does not satisfy α→ γ.
  • 85. Exercise Consider: R = (A, B, C, D) F = { B  ABD, A  D, D  A, B  AD }. Compute Fc.
  • 86. Exercise Given the database schema R(a,b,c), and a relation r on the schema R, write a RA query to test whether the functional dependency b → c holds on relation r. • First, let’s agree to let an empty relation denote FALSE and a non-empty relation denote TRUE. • Second, recall that we can project on an empty list of attributes Ø. In this case, there are only two possible values for  Ø(R) for any instance of R, namely, {} and {()}.