SlideShare a Scribd company logo
1 of 35
Introduction to Relational
Databases
Introduction
• Database – collection of persistent data
• Database Management System (DBMS) –
software system that supports creation,
population, and querying of a database
Relational Database
• Relational Database Management System
(RDBMS)
– Consists of a number of tables and single
schema (definition of tables and attributes)
– Students (sid, name, login, age, gpa)
Students identifies the table
sid, name, login, age, gpa identify
attributes
sid is primary key
An Example Table
• Students (sid: string, name: string, login: string, age:
integer, gpa: real)
sid name login age gpa
50000 Dave dave@cs 19 3.3
53666 Jones jones@cs 18 3.4
53688 Smith smith@ee 18 3.2
53650 Smith smith@math 19 3.8
53831 Madayan madayan@music 11 1.8
53832 Guldu guldu@music 12 2.0
Another example: Courses
• Courses (cid, instructor, quarter, dept)
cid instructor quarter dept
Carnatic101 Jane Fall 06 Music
Reggae203 Bob Summer 06 Music
Topology101 Mary Spring 06 Math
History105 Alice Fall 06 History
Keys
• Primary key – minimal subset of fields that
is unique identifier for a tuple
– sid is primary key for Students
– cid is primary key for Courses
• Foreign key –connections between tables
– Courses (cid, instructor, quarter, dept)
– Students (sid, name, login, age, gpa)
– How do we express which students take each
course?
Many to many relationships
• In general, need a new table
Enrolled(cid, grade, studid)
Studid is foreign key that references sid in
Student table
cid grade studid
Carnatic101 C 53831
Reggae203 B 53832
Topology112 A 53650
History 105 B 53666
sid name login
50000 Dave dave@cs
53666 Jones jones@cs
53688 Smith smith@ee
53650 Smith smith@math
53831 Madayan madayan@musi
53832 Guldu guldu@music
Enrolled
Student
Foreign
key
Relational Algebra
• Collection of operators for specifying
queries
• Query describes step-by-step procedure
for computing answer (i.e., operational)
• Each operator accepts one or two
relations as input and returns a relation as
output
• Relational algebra expression composed
of multiple operators
Basic operators
• Selection – return rows that meet some
condition
• Projection – return column values
• Union
• Cross product
• Difference
• Other operators can be defined in terms of
basic operators
Example Schema (simplified)
• Courses (cid, instructor, quarter, dept)
• Students (sid, name, gpa)
• Enrolled (cid, grade, studid)
Selection
Select students with gpa higher than 3.3 from S1:
σgpa>3.3(S1)
S1
sid name gpa
50000 Dave 3.3
53666 Jones 3.4
53688 Smith 3.2
53650 Smith 3.8
53831 Madayan 1.8
53832 Guldu 2.0
sid name gpa
53666 Jones 3.4
53650 Smith 3.8
Projection
Project name and gpa of all students in S1:
name, gpa(S1)
S1
Sid name gpa
50000 Dave 3.3
53666 Jones 3.4
53688 Smith 3.2
53650 Smith 3.8
53831 Madayan 1.8
53832 Guldu 2.0
name gpa
Dave 3.3
Jones 3.4
Smith 3.2
Smith 3.8
Madayan 1.8
Guldu 2.0
Combine Selection and Projection
• Project name and gpa of students in S1
with gpa higher than 3.3:
name,gpa(σgpa>3.3(S1))
Sid name gpa
50000 Dave 3.3
53666 Jones 3.4
53688 Smith 3.2
53650 Smith 3.8
53831 Madayan 1.8
53832 Guldu 2.0
name gpa
Jones 3.4
Smith 3.8
Set Operations
• Union (R U S)
– All tuples in R or S (or both)
– R and S must have same number of fields
– Corresponding fields must have same
domains
• Intersection (R ∩ S)
– All tuples in both R and S
• Set difference (R – S)
– Tuples in R and not S
Set Operations (continued)
• Cross product or Cartesian product (R x S)
– All fields in R followed by all fields in S
– One tuple (r,s) for each pair of tuples r  R, s
 S
Example: Intersection
sid name gpa
50000 Dave 3.3
53666 Jones 3.4
53688 Smith 3.2
53650 Smith 3.8
53831 Madayan 1.8
53832 Guldu 2.0
sid name gpa
53666 Jones 3.4
53688 Smith 3.2
53700 Tom 3.5
53777 Jerry 2.8
53832 Guldu 2.0
S1 S2
S1  S2 =
sid name gpa
53666 Jones 3.4
53688 Smith 3.2
53832 Guldu 2.0
Joins
• Combine information from two or more
tables
• Example: students enrolled in courses:
S1 S1.sid=E.studidE
Sid name gpa
50000 Dave 3.3
53666 Jones 3.4
53688 Smith 3.2
53650 Smith 3.8
53831 Madayan 1.8
53832 Guldu 2.0
cid grade studid
Carnatic101 C 53831
Reggae203 B 53832
Topology112 A 53650
History 105 B 53666
S1
E
Joins
sid name gpa cid grade studid
53666 Jones 3.4 History105 B 53666
53650 Smith 3.8 Topology112 A 53650
53831 Madayan 1.8 Carnatic101 C 53831
53832 Guldu 2.0 Reggae203 B 53832
Sid name gpa
50000 Dave 3.3
53666 Jones 3.4
53688 Smith 3.2
53650 Smith 3.8
53831 Madayan 1.8
53832 Guldu 2.0
cid grade studid
Carnatic101 C 53831
Reggae203 B 53832
Topology112 A 53650
History 105 B 53666
S1
E
Relational Algebra Summary
• Algebras are useful to manipulate data types
(relations in this case)
• Set-oriented
• Brings some clarity to what needs to be done
• Opportunities for optimization
– May have different expressions that do same
thing
• We will see examples of algebras for other
types of data in this course
Intro to SQL
• CREATE TABLE
– Create a new table, e.g., students, courses,
enrolled
• SELECT-FROM-WHERE
– List all CS courses
• INSERT
– Add a new student, course, or enroll a student
in a course
Create Table
• CREATE TABLE Enrolled
(studid CHAR(20),
cid CHAR(20),
grade CHAR(20),
PRIMARY KEY (studid, cid),
FOREIGN KEY (studid) references Students)
Select-From-Where query
• “Find all students who are under 18”
SELECT *
FROM Students S
WHERE S.age < 18
Queries across multiple tables
(joins)
• “Print the student name and course ID
where the student received an ‘A’ in the
course”
SELECT S.name, E.cid
FROM Students S, Enrolled E
WHERE S.sid = E.studid AND E.grade = ‘A’
Other SQL features
• MIN, MAX, AVG
– Find highest grade in fall database course
• COUNT, DISTINCT
– How many students enrolled in CS courses in
the fall?
• ORDER BY, GROUP BY
– Rank students by their grade in fall database
course
Views
• Virtual table defined on base tables defined
by a query
– Single or multiple tables
• Security – “hide” certain attributes from
users
– Show students in each course but hide their
grades
• Ease of use – expression that is more
intuitively obvious to user
• Views can be materialized to improve query
performance
Views
• Suppose we often need names of students
who got a ‘B’ in some course:
CREATE VIEW B_Students(name, sid,
course)
AS SELECT S.sname, S.sid, E.cid
FROM Students S, Enrolled E
WHERE S.sid=E.studid and E.grade = ‘B’
name sid course
Jones 53666 History105
Guldu 53832 Reggae203
Indexes
• Idea: speed up access to desired data
• “Find all students with gpa > 3.3
• May need to scan entire table
• Index consists of a set of entries pointing
to locations of each search key
Types of Indexes
• Clustered vs. Unclustered
– Clustered- ordering of data records same as
ordering of data entries in the index
– Unclustered- data records in different order
from index
• Primary vs. Secondary
– Primary – index on fields that include primary
key
– Secondary – other indexes
Example: Clustered Index
• Sorted by sid
sid name gpa
50000 Dave 3.3
53650 Smith 3.8
53666 Jones 3.4
53688 Smith 3.2
53831 Madayan 1.8
53832 Guldu 2.0
50000
53600
53800
Example: Unclustered Index
• Sorted by sid
• Index on gpa
sid name gpa
50000 Dave 3.3
53650 Smith 3.8
53666 Jones 3.4
53688 Smith 3.2
53831 Madayan 1.8
53832 Guldu 2.0
1.8
2.0
3.2
3.3
3.4
3.8
Comments on Indexes
• Indexes can significantly speed up query
execution
• But inserts more costly
• May have high storage overhead
• Need to choose attributes to index wisely!
– What queries are run most frequently?
– What queries could benefit most from an
index?
• Preview of things to come: SDSS
Summary: Why are RDBMS useful?
• Data independence – provides abstract
view of the data, without details of storage
• Efficient data access – uses techniques to
store and retrieve data efficiently
• Reduced application development time –
many important functions already supported
• Centralized data administration
• Data Integrity and Security
• Concurrency control and recovery
So, why don’t scientists use them?
• “I tried to use databases in my project, but
they were just too [slow | hard-to-use |
expensive | complex] . So I use files”.
– Gray and Szalay, Where Rubber Meets the
Sky: Bridging the Gap Between Databases
and Science
Some other limitations of RDBMS
• Arrays
• Hierarchical data
Example: Taxonomy of Organisms
• Hierarchy of categories:
– Kingdom - phylum – class – order – family – genus -
species
– How would you design a relational schema for this?
Animals
Chordates
Vertebrates
Arthropods
birds
insects spiders crustaceans
reptiles mammals

More Related Content

Similar to Relational_Intro_1.ppt

Pf cs102 programming-10 [structs]
Pf cs102 programming-10 [structs]Pf cs102 programming-10 [structs]
Pf cs102 programming-10 [structs]Abdullah khawar
 
Design Patterns using Amazon DynamoDB
 Design Patterns using Amazon DynamoDB Design Patterns using Amazon DynamoDB
Design Patterns using Amazon DynamoDBAmazon Web Services
 
Lecture-2 - Relational Model.pptx
Lecture-2 - Relational Model.pptxLecture-2 - Relational Model.pptx
Lecture-2 - Relational Model.pptxHanzlaNaveed1
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQLGeorgi Sotirov
 
Database queries
Database queriesDatabase queries
Database querieslaiba29012
 
The Art and Science of DDS Data Modelling
The Art and Science of DDS Data ModellingThe Art and Science of DDS Data Modelling
The Art and Science of DDS Data ModellingAngelo Corsaro
 
Bsc cs ii-dbms- u-iii-data modeling using e.r. model (entity relationship model)
Bsc cs ii-dbms- u-iii-data modeling using e.r. model (entity relationship model)Bsc cs ii-dbms- u-iii-data modeling using e.r. model (entity relationship model)
Bsc cs ii-dbms- u-iii-data modeling using e.r. model (entity relationship model)Rai University
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAmazon Web Services
 
Ch3_Rel_Model-95.ppt
Ch3_Rel_Model-95.pptCh3_Rel_Model-95.ppt
Ch3_Rel_Model-95.pptAtharvaBagul2
 
CSCI 4140-AA Pre-Req.ppt
CSCI 4140-AA Pre-Req.pptCSCI 4140-AA Pre-Req.ppt
CSCI 4140-AA Pre-Req.pptZahidKhan671907
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsDATAVERSITY
 

Similar to Relational_Intro_1.ppt (20)

Data Retrival
Data RetrivalData Retrival
Data Retrival
 
Pf cs102 programming-10 [structs]
Pf cs102 programming-10 [structs]Pf cs102 programming-10 [structs]
Pf cs102 programming-10 [structs]
 
Design Patterns using Amazon DynamoDB
 Design Patterns using Amazon DynamoDB Design Patterns using Amazon DynamoDB
Design Patterns using Amazon DynamoDB
 
Algebra
AlgebraAlgebra
Algebra
 
Lecture-2 - Relational Model.pptx
Lecture-2 - Relational Model.pptxLecture-2 - Relational Model.pptx
Lecture-2 - Relational Model.pptx
 
Keys.ppt
Keys.pptKeys.ppt
Keys.ppt
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQL
 
Database queries
Database queriesDatabase queries
Database queries
 
The Art and Science of DDS Data Modelling
The Art and Science of DDS Data ModellingThe Art and Science of DDS Data Modelling
The Art and Science of DDS Data Modelling
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Bsc cs ii-dbms- u-iii-data modeling using e.r. model (entity relationship model)
Bsc cs ii-dbms- u-iii-data modeling using e.r. model (entity relationship model)Bsc cs ii-dbms- u-iii-data modeling using e.r. model (entity relationship model)
Bsc cs ii-dbms- u-iii-data modeling using e.r. model (entity relationship model)
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
 
Ch3_Rel_Model-95.ppt
Ch3_Rel_Model-95.pptCh3_Rel_Model-95.ppt
Ch3_Rel_Model-95.ppt
 
Characterization
CharacterizationCharacterization
Characterization
 
My sql
My sqlMy sql
My sql
 
CSCI 4140-AA Pre-Req.ppt
CSCI 4140-AA Pre-Req.pptCSCI 4140-AA Pre-Req.ppt
CSCI 4140-AA Pre-Req.ppt
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
Amazon DynamoDB Design Workshop
Amazon DynamoDB Design WorkshopAmazon DynamoDB Design Workshop
Amazon DynamoDB Design Workshop
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Sqlserver 2008 r2
Sqlserver 2008 r2Sqlserver 2008 r2
Sqlserver 2008 r2
 

Recently uploaded

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 

Recently uploaded (20)

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 

Relational_Intro_1.ppt

  • 2. Introduction • Database – collection of persistent data • Database Management System (DBMS) – software system that supports creation, population, and querying of a database
  • 3. Relational Database • Relational Database Management System (RDBMS) – Consists of a number of tables and single schema (definition of tables and attributes) – Students (sid, name, login, age, gpa) Students identifies the table sid, name, login, age, gpa identify attributes sid is primary key
  • 4. An Example Table • Students (sid: string, name: string, login: string, age: integer, gpa: real) sid name login age gpa 50000 Dave dave@cs 19 3.3 53666 Jones jones@cs 18 3.4 53688 Smith smith@ee 18 3.2 53650 Smith smith@math 19 3.8 53831 Madayan madayan@music 11 1.8 53832 Guldu guldu@music 12 2.0
  • 5. Another example: Courses • Courses (cid, instructor, quarter, dept) cid instructor quarter dept Carnatic101 Jane Fall 06 Music Reggae203 Bob Summer 06 Music Topology101 Mary Spring 06 Math History105 Alice Fall 06 History
  • 6. Keys • Primary key – minimal subset of fields that is unique identifier for a tuple – sid is primary key for Students – cid is primary key for Courses • Foreign key –connections between tables – Courses (cid, instructor, quarter, dept) – Students (sid, name, login, age, gpa) – How do we express which students take each course?
  • 7. Many to many relationships • In general, need a new table Enrolled(cid, grade, studid) Studid is foreign key that references sid in Student table cid grade studid Carnatic101 C 53831 Reggae203 B 53832 Topology112 A 53650 History 105 B 53666 sid name login 50000 Dave dave@cs 53666 Jones jones@cs 53688 Smith smith@ee 53650 Smith smith@math 53831 Madayan madayan@musi 53832 Guldu guldu@music Enrolled Student Foreign key
  • 8. Relational Algebra • Collection of operators for specifying queries • Query describes step-by-step procedure for computing answer (i.e., operational) • Each operator accepts one or two relations as input and returns a relation as output • Relational algebra expression composed of multiple operators
  • 9. Basic operators • Selection – return rows that meet some condition • Projection – return column values • Union • Cross product • Difference • Other operators can be defined in terms of basic operators
  • 10. Example Schema (simplified) • Courses (cid, instructor, quarter, dept) • Students (sid, name, gpa) • Enrolled (cid, grade, studid)
  • 11. Selection Select students with gpa higher than 3.3 from S1: σgpa>3.3(S1) S1 sid name gpa 50000 Dave 3.3 53666 Jones 3.4 53688 Smith 3.2 53650 Smith 3.8 53831 Madayan 1.8 53832 Guldu 2.0 sid name gpa 53666 Jones 3.4 53650 Smith 3.8
  • 12. Projection Project name and gpa of all students in S1: name, gpa(S1) S1 Sid name gpa 50000 Dave 3.3 53666 Jones 3.4 53688 Smith 3.2 53650 Smith 3.8 53831 Madayan 1.8 53832 Guldu 2.0 name gpa Dave 3.3 Jones 3.4 Smith 3.2 Smith 3.8 Madayan 1.8 Guldu 2.0
  • 13. Combine Selection and Projection • Project name and gpa of students in S1 with gpa higher than 3.3: name,gpa(σgpa>3.3(S1)) Sid name gpa 50000 Dave 3.3 53666 Jones 3.4 53688 Smith 3.2 53650 Smith 3.8 53831 Madayan 1.8 53832 Guldu 2.0 name gpa Jones 3.4 Smith 3.8
  • 14. Set Operations • Union (R U S) – All tuples in R or S (or both) – R and S must have same number of fields – Corresponding fields must have same domains • Intersection (R ∩ S) – All tuples in both R and S • Set difference (R – S) – Tuples in R and not S
  • 15. Set Operations (continued) • Cross product or Cartesian product (R x S) – All fields in R followed by all fields in S – One tuple (r,s) for each pair of tuples r  R, s  S
  • 16. Example: Intersection sid name gpa 50000 Dave 3.3 53666 Jones 3.4 53688 Smith 3.2 53650 Smith 3.8 53831 Madayan 1.8 53832 Guldu 2.0 sid name gpa 53666 Jones 3.4 53688 Smith 3.2 53700 Tom 3.5 53777 Jerry 2.8 53832 Guldu 2.0 S1 S2 S1  S2 = sid name gpa 53666 Jones 3.4 53688 Smith 3.2 53832 Guldu 2.0
  • 17. Joins • Combine information from two or more tables • Example: students enrolled in courses: S1 S1.sid=E.studidE Sid name gpa 50000 Dave 3.3 53666 Jones 3.4 53688 Smith 3.2 53650 Smith 3.8 53831 Madayan 1.8 53832 Guldu 2.0 cid grade studid Carnatic101 C 53831 Reggae203 B 53832 Topology112 A 53650 History 105 B 53666 S1 E
  • 18. Joins sid name gpa cid grade studid 53666 Jones 3.4 History105 B 53666 53650 Smith 3.8 Topology112 A 53650 53831 Madayan 1.8 Carnatic101 C 53831 53832 Guldu 2.0 Reggae203 B 53832 Sid name gpa 50000 Dave 3.3 53666 Jones 3.4 53688 Smith 3.2 53650 Smith 3.8 53831 Madayan 1.8 53832 Guldu 2.0 cid grade studid Carnatic101 C 53831 Reggae203 B 53832 Topology112 A 53650 History 105 B 53666 S1 E
  • 19. Relational Algebra Summary • Algebras are useful to manipulate data types (relations in this case) • Set-oriented • Brings some clarity to what needs to be done • Opportunities for optimization – May have different expressions that do same thing • We will see examples of algebras for other types of data in this course
  • 20. Intro to SQL • CREATE TABLE – Create a new table, e.g., students, courses, enrolled • SELECT-FROM-WHERE – List all CS courses • INSERT – Add a new student, course, or enroll a student in a course
  • 21. Create Table • CREATE TABLE Enrolled (studid CHAR(20), cid CHAR(20), grade CHAR(20), PRIMARY KEY (studid, cid), FOREIGN KEY (studid) references Students)
  • 22. Select-From-Where query • “Find all students who are under 18” SELECT * FROM Students S WHERE S.age < 18
  • 23. Queries across multiple tables (joins) • “Print the student name and course ID where the student received an ‘A’ in the course” SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid = E.studid AND E.grade = ‘A’
  • 24. Other SQL features • MIN, MAX, AVG – Find highest grade in fall database course • COUNT, DISTINCT – How many students enrolled in CS courses in the fall? • ORDER BY, GROUP BY – Rank students by their grade in fall database course
  • 25. Views • Virtual table defined on base tables defined by a query – Single or multiple tables • Security – “hide” certain attributes from users – Show students in each course but hide their grades • Ease of use – expression that is more intuitively obvious to user • Views can be materialized to improve query performance
  • 26. Views • Suppose we often need names of students who got a ‘B’ in some course: CREATE VIEW B_Students(name, sid, course) AS SELECT S.sname, S.sid, E.cid FROM Students S, Enrolled E WHERE S.sid=E.studid and E.grade = ‘B’ name sid course Jones 53666 History105 Guldu 53832 Reggae203
  • 27. Indexes • Idea: speed up access to desired data • “Find all students with gpa > 3.3 • May need to scan entire table • Index consists of a set of entries pointing to locations of each search key
  • 28. Types of Indexes • Clustered vs. Unclustered – Clustered- ordering of data records same as ordering of data entries in the index – Unclustered- data records in different order from index • Primary vs. Secondary – Primary – index on fields that include primary key – Secondary – other indexes
  • 29. Example: Clustered Index • Sorted by sid sid name gpa 50000 Dave 3.3 53650 Smith 3.8 53666 Jones 3.4 53688 Smith 3.2 53831 Madayan 1.8 53832 Guldu 2.0 50000 53600 53800
  • 30. Example: Unclustered Index • Sorted by sid • Index on gpa sid name gpa 50000 Dave 3.3 53650 Smith 3.8 53666 Jones 3.4 53688 Smith 3.2 53831 Madayan 1.8 53832 Guldu 2.0 1.8 2.0 3.2 3.3 3.4 3.8
  • 31. Comments on Indexes • Indexes can significantly speed up query execution • But inserts more costly • May have high storage overhead • Need to choose attributes to index wisely! – What queries are run most frequently? – What queries could benefit most from an index? • Preview of things to come: SDSS
  • 32. Summary: Why are RDBMS useful? • Data independence – provides abstract view of the data, without details of storage • Efficient data access – uses techniques to store and retrieve data efficiently • Reduced application development time – many important functions already supported • Centralized data administration • Data Integrity and Security • Concurrency control and recovery
  • 33. So, why don’t scientists use them? • “I tried to use databases in my project, but they were just too [slow | hard-to-use | expensive | complex] . So I use files”. – Gray and Szalay, Where Rubber Meets the Sky: Bridging the Gap Between Databases and Science
  • 34. Some other limitations of RDBMS • Arrays • Hierarchical data
  • 35. Example: Taxonomy of Organisms • Hierarchy of categories: – Kingdom - phylum – class – order – family – genus - species – How would you design a relational schema for this? Animals Chordates Vertebrates Arthropods birds insects spiders crustaceans reptiles mammals