This document provides information about the CS501 Database Systems and Data Mining course. It includes details about the course structure, timings, syllabus, evaluation policy, and introductory concepts about databases and database management systems. The syllabus covers topics such as data models, query languages, database design, data storage and indexing, query processing, and data mining concepts and techniques. Required textbooks and the evaluation criteria consisting of assignments, quizzes, mid-semester and end-semester exams are also specified.
2. Course Structure
CS501 (3-0-0-6)
Class Timings
Mon: 04:00pm - 05:30pm
Fri: 03:00pm – 04:30pm
Slides, lecture materials, assignments will be
uploaded on time to time manner
2
3. SYLLABUS- DATABASE SYSTEMS
Data models: entity-relationship, relational, network,
hierarchical, and logic data models, with the emphasis on
the relational model.
Query languages: relational algebra, relational calculus,
SQL, QBE.
Theory of database design: functional dependencies;
normal forms: 1NF, 2NF, 3NF, Boyce-Codd NF;
decompositions; normalization; multivalued dependencies,
join dependencies, 4NF, 5NF.
Data storage and indexing: disks, files, file
organizations, indexes; tree structured indexing (ISAM, B-
trees), hash based indexing.
Query processing: evaluation of relational operators,
query optimization; transaction management, Concurrency
control; error recovery; security.
Case studies: ORACLE, Microsoft access etc. Introduction
to Open Database Connectivity, Client-Server environment
etc.
3
4. SYLLABUS-DATA MINING
Types of data mining problems. The process of data mining.
Statistical evaluation of big data: statistical prediction,
performance measures, pitfalls in data-mining evaluation.
Data preparation: data models, data transformations,
handling of missing data, time-dependent data, textual
data.
Data reduction: feature selection, principal components,
smoothing data, case subsampling.
Predictive modeling: mathematical models, linear models,
neural nets, advanced statistical models, distance
solutions, logic solutions, decision trees, decision rules,
model combination.
Solution analyses: graphical trend analyses, comparison of
methods.
Case studies. Future trends: text mining, visualization,
distributed data. Practical sessions using open-source
software.
4
5. BOOKS
A. Silberschatz, H. F. Korth and S. Sudarshan,
Database System Concepts, 6th Ed, McGraw Hill,
2011.
J. Han, M. Kamber and J. Pei, Data Mining
Concepts and Techniques, 3rd Ed, Morgan
Kaufmann
5
6. EVALUATION POLICY
CS501: Database Systems and Data Mining
Assignment, Quiz, Attendance: 20%
Mid Sem: 30%
End Sem: 50%
Attendance is compulsory
6
7. DATABASE
A collection of interrelated data
Usually designed to manage large bodies of
information
Models real world enterprise
Entities (e.g. student, courses)
Relationships (e.g. students are enrolled to courses)
A database management system (DBMS) is a
software package designed to store and manage
databases in a convenient and efficient way
7
8. DATABASE SYSTEM APPLICATIONS
Some representative applications-
Banking
Airlines
Universities
Credit card transactions
Telecommunications
Finance
Sales
Manufacturing
Human Resource
8
9. FILE SYSTEM VS DATABASE SYSTEM
9File
System
Database
System
10. FILE SYSTEMS VS DBMS
Data redundancy and inconsistency
Difficulty in accessing data
Data isolation
Integrity problem
Atomicity problem
Concurrent access anomalies
Security and access control
10
11. WHY USE A DBMS
Data independence and efficient access.
Reduced application development time.
Data integrity and security.
Uniform data administration.
Concurrent access, recovery from crashes.
11
12. DATA MODELS
Underlying the structure of the database is the
data model
It is a collection of tools for describing data, data
relationships, data semantics and consistency
constraints.
The relational model of data is the most widely
used model today.
Main concept: relation, basically a table with
rows and columns.
Every relation has a schema, which describes the
columns, or fields.
12
14. WHY STUDY DATABASES
Shift from computation to information
Datasets increasing in diversity and volume.
DBMS encompasses most of CS applications
14
15. VIEW OF DATA
A major purpose of database system is to provide
users with an abstract view of the data
The data from the database must be retrieved
efficiently
This need has led designers to use complex data
structures
Since many users are not computer trained
So developers hide the complexity from users through
several levels of abstraction
15
16. LEVELS OF ABSTRACTION
Many views, single
conceptual (logical)
schema and physical
schema.
Views describe how
users see the data.
Conceptual schema
defines logical structure
Physical schema
describes the files and
indexes used.
Schemas are defined
using DDL; data is
modified/queried using
DML.
View1 View2 View3
Conceptual Schema
Physical Schema
16
17. EXAMPLE: UNIVERSITY DATABASE
Physical schema:
Relations stored as unordered files.
Index on first column of Students.
Conceptual schema:
Students(sid: string, name: string, login: string, age:
integer, gpa:real)
Courses(cid: string, cname:string, credits:integer)
Enrolled(sid:string, cid:string, grade:string)
External Schema (View):
Course_info(cid:string,enrollment:integer)
17
18. DATA INDEPENDENCE
Applications insulated from how data is
structured and stored.
Logical data independence: Protection from
changes in logical structure of data.
Physical data independence: Protection from
changes in physical structure of data.
Data Independence is one of the most important
benefits of using a DBMS
18
19. INSTANCES AND SCHEMAS
Instance of the database: the collection of
information stored in the database at a particular
moment
Database schema: the overall design of the
database
19
20. DATA MODELS
A data model is a collection of conceptual tools for
describing data, data relationships, data
semantics and consistency constraints.
Relational Model
Entity Relationship Model
Object-Based Data Model
Semistructured Data Model
Older models
Network
Hierarchical
20
21. DATABASE USERS
Users are differentiated by the way they expect to
interact with the system
Naive users – invoke one of the permanent
application programs that have been written
previously
Application programmers – computer
professionals who interact with system through
application programs
Sophisticated users – form requests in a
database query language
Specialized users – write specialized database
applications that do not fit into the traditional
data processing framework 21
22. DATABASE ADMINISTRATOR
Coordinates all the activities of the database
system
Should have a good understanding of the enterprise’s
information resources and needs.
Database administrator's duties include:
Storage structure and access method definition
Schema and physical organization modification
Granting users authority to access the database
Backing up data
Monitoring performance and responding to changes
22
23. ENTITY RELATIONSHIP MODEL
Widely used conceptual level data model
proposed by Peter P Chen in 1970s
Data model to describe the database system at
the requirements collection stage
high level description.
easy to understand for the enterprise managers.
rigorous enough to be used for system building.
Concepts available in the model
entities and attributes of entities.
relationships between entities.
diagrammatic notation.
23
24. ENTITIES
Entity Real-world object distinguishable from
other objects. An entity is described (in DB) using
a set of attributes.
In the University database context, an individual
student, faculty member, a class room, a course are
entities.
Entity Set or Entity Type-
Collection of entities all having the same properties.
Student entity set –collection of all student entities.
Course entity set –collection of all course entities.
24
25. ATTRIBUTE
Each entity is described by a set of
attributes/properties.
Student entity
StudName–name of the student.
RollNumber–the roll number of the student.
Sex–the gender of the student etc.
All entities in an Entity set/type have the same
set of attributes.
25
26. TYPES OF ATTRIBUTES
Simple Attributes
having atomic or indivisible values.
E.g. Dept–a string
PhoneNumber–an eight digit number
Composite Attributes
having several components in the value.
E.g. Qualification with components
(DegreeName, Year, UniversityName)
Derived Attributes
Attribute value is dependent on some other attribute.
E.g: Age depends on DateOfBirth. So age is a derived
attribute.
26
27. TYPES OF ATTRIBUTES (2)
Single-valued
having only one value rather than a set of values.
E.g., PlaceOfBirth–single string value.
Multi-valued
having a set of values rather than a single value.
E.g., CoursesEnrolled attribute for student
EmailAddress attribute for student
PreviousDegree attribute for student.
Attributes can be:
simple single-valued, simple multi-valued,
composite single-valued or composite multi-valued.
27
29. DOMAINS OF ATTRIBUTES
Each attribute takes values from a set called its
domain
For example,
StudentAge–{17,18, …, 55}
HomeAddress–character strings of length 35
Domain of composite attributes –
cross product of domains of component attributes
Domain of multi-valued attributes –
set of subsets of values from the basic domain
29
30. ENTITY SETS AND KEY ATTRIBUTES
Key–an attribute or a collection of attributes
whose value(s) uniquely identify an entity in the
entity set.
For instance,
RollNumber- Key for Student entity set
EmpID- Key for Faculty entity set
HostelName, RoomNo- Key for Student entity set
(assuming that each student gets to stay in a single
room)
A key for an entity set may have more than one
attribute.
An entity set may have more than one key.
Determined by the designers 30
31. RELATIONSHIPS
When two or more entities are associated with
each other, we have an instance of a
Relationship.
E.g: student Ramesh enrolls in Discrete Mathematics
course
Relationship Enrolls has Student and Course as the
participating entity sets.
Formally, Enrolls ⊆ Student ×Course
(s,c) ∈ enrolls ⇔ Student ‘s’ has enrolled in Course ‘c’
Tuples in enrolls known as relationship instances
Enrolls is called a relationship Type/Set.
31
32. DEGREE OF A RELATIONSHIP
Degree: the number of participating entities.
Degree 2: binary
Degree 3: ternary
Degree n: n-ary
Binary relationships are very common and widely used.
32
34. BINARY RELATION & CARDINALITY
34
E1 E2R
m n
The number of entities from E2 that an entity from E1 can possibly be
associated through R (and vice-versa) determines the cardinality ratio
of R.
Four possibilities-
One to one, one to many, many to one and many to many
35. PARTICIPATION CONSTRAINT
An entity set may participate in a relation either
totally or partially.
Total participation: Every entity in the set is
involved in some association (or tuple) of the
relationship.
Partial participation: Not all entities in the set
are involved in association (or tuples) of the
relationship.
35
E1 E2R
total partial
36. STRUCTURAL CONSTRAINTS
Cardinality Ratio and Participation Constraints are
together called Structural Constraints.
They are called constraints as the data must satisfy
them to be consistent with the requirements.
Min-Max notation: pair of numbers (m,n) placed on
the line connecting an entity to the relationship.
m: the minimum number of times a particular entity
must appear in the relationship tuples at any point of
time
0 –partial participation
≥1 –total participation
n: similarly, the maximum number of times a
particular entity can appear in the relationship tuples
at any point of time 36
38. ATTRIBUTES FOR RELATIONSHIP TYPES
Relationship types can also have attributes.
Grade gives the letter grade (S,A,B, etc.) earned by
the student for a course.
neither an attribute of student nor that of course.
38
Student CourseEnr
olls
m n
Grade
39. RECURSIVE RELATIONSHIP AND ROLE
NAME
Recursive relationship: An entity set relating to
itself gives rise to a recursive relationship
E.g., the relationship prereqOf is an example of a
recursive relationship on the entity Course
Role Names –used to specify the exact role in
which the entity participates in the relationships
Role Names are essential in case of recursive
relationships
39
Course prereqOf
prerequisite
course
Role Names
40. WEAK ENTITY SET
Weak Entity Set: An entity set whose members owe
their existence to some entity in a strong entity set.
Entities are not of independent existence.
Each weak entity is associated with some entity of the
owner entity set through a special relationship.
Weak entity set may not have a key attribute.
The discriminator (or partial key) of a weak entity set is
the set of attributes that distinguishes among all the
entities of a weak entity set.
40
S WR
Always
total
Owner Entity Identifying
Relationship
Weak entity
41. WEAK ENTITY SET EXAMPLE
41
Loan PaymentLoan_
No
Amount
Loa
n_p
aym
ent
Payment_no
PayDa
te
Amount
42. EXTENDED ER FEATURES
Basic ER concepts are used to model most
database features
However, some features may be expressed more
aptly by using certain extensions to the basic ER
model
Some of these features are
Specialization
Generalization
Aggregation
42
43. SPECIALIZATION
A top-down design process
Designate subgroupings within an entity set that
are distinctive from other entities in the set
These subgroupings become lower-level entity
sets that have attributes or participate in
relationships that do not apply to the higher-level
entity set
Depicted by a triangle component labeled ISA
(E.g. customer “is a” person)
Attribute inheritance – a lower-level entity set
inherits all the attributes and relationship
participation of the higher-level entity set to
which it is linked 43
44. GENERALIZATION
A bottom-up design process – combine a
number of entity sets that share the same
features into a higher-level entity set.
Specialization and generalization are simple
inversions of each other; they are represented in
an E-R diagram in the same way.
The terms specialization and generalization are
used interchangeably.
The ISA relationship also referred to as
superclass - subclass relationship
44
46. CONSTRAINTS ON SPECIALIZATION/
GENERALIZATION
Constraints on which entities can be members of
a given lower-level entity set
Condition-defined:
all customers over 65 years are members of senior-citizen
entity set; senior-citizen ISA person.
User-defined
An employee is assigned to a group after 3 months
Not done automatically
The user in charge of the dept. makes the assignment
46
47. CONSTRAINTS ON SPECIALIZATION/
GENERALIZATION (CONTD)
Constraint on whether or not entities may belong
to more than one lower-level entity set within a
single generalization.
Disjoint
an entity can belong to only one lower-level entity set
Noted in E-R diagram by writing disjoint next to the ISA
triangle
Overlapping
an entity can belong to more than one lower-level entity set
47
ISA
Disjoint
48. CONSTRAINTS ON SPECIALIZATION/
GENERALIZATION (CONTD.)
Completeness constraint : specifies whether or
not an entity in the higher-level entity set must
belong to at least one of the lower-level entity
sets within a generalization.
Total: an entity must belong to one of the lower-level
entity sets
Partial: an entity need not belong to one of the
lower-level entity sets
Partial generalization is the default
48
ISA
Total
generalization
49. AGGREGATION
Consider the ternary relationship works_on,
Suppose we want to record managers for tasks
performed by an employee at a branch
49
50. AGGREGATION (CONTD.)
Relationship sets works_on and manages
represent overlapping information
Every manages relationship corresponds to a
works_on relationship
However, some works_on relationships may not
correspond to any manages relationships
So we can’t discard the works_on relationship
50
51. AGGREGATION (CONTD.)
Eliminate this redundancy via aggregation
Treat relationship as an abstract entity
Allows relationships between relationships
Abstraction of relationship into new entity
Without introducing redundancy, the following
diagram represents:
An employee works on a particular job at a particular
branch
An employee, branch, job combination may have an
associated manager
51
56. RELATIONAL MODEL
Proposed by Edgar. F. Codd(1923-2003) in the
early seventies. [ Turing Award –1981 ]
Most of the modern DBMS are relational
Simple and elegant model with a mathematical
basis
Led to the development of a theory of data
dependencies and database design.
Relational algebra operations –
crucial role in query optimization and execution.
Laid the foundation for the development of
Tuple relational calculus and then
Database standard SQL 56
57. STRUCTURE OF RELATIONAL DATABASES
Consists of a collection of tables
Row in a table represents a relationship among a
set of values
Thus a table is a collection of relationships
In relational model table is also referred by
relation
Tuple is a sequence of values
In relational model, a tuple corresponds to a row
in a table
57
58. RELATIONAL SCHEMA
Consists of relation name, and a set of attributes
or field names or column names. Each attribute
has an associated domain.
Example:
student ( studentName: string,
rollNumber: string,
phoneNumber: integer,
yearOfAdmission:integer,
branchOfStudy :string )
Domain–set of atomic(or indivisible) values –data type
Relation
name
Attribute
name
Domain
58
59. RELATION INSTANCE
A finite set of tuples constitute a relation
instance.
A tuple of relation with schema R = (A1, A2, …,
Am) is an ordered sequence of values (v1,v2, ... ,vm)
such that vi∈ domain (Ai), 1≤i ≤m
Roll_no Name yearOfAdmin branchOfSt
udy
10CS001 Rajesh 2010 CSE
09CS020 Kiran 2009 CSE
09EE011 Ravi 2009 EE
59
60. KEYS
Key: should have a capability of uniquely
identifying a tuple in a relation
Superkey: a set of one or more attributes that
taken collectively allow us to identify uniquely a
tuple in the relation
Example: {customer_name, customer_street} and
{customer_name}
are both superkeys of Customer, if no two customers can
possibly have the same name
60
61. KEY (CONTD.)
K is a candidate key if K is minimal
Example: {customer_name} is a candidate key for
Customer, since it is a superkey and no subset of
it is a superkey.
Primary key: a candidate key chosen as the
principal means of identifying tuples within a
relation
Should choose an attribute whose value never, or
very rarely, changes.
E.g. email_address is unique, but may change
however empid rarely changes 61
62. KEYS (CONTD.)
Foreign Key: The attribute that corresponds to
the primary key of another relation.
E.g. customer_name and account_number attributes
of depositor are foreign keys to customer and account
respectively.
Only values occurring in the primary key attribute of
the referenced relation may occur in the foreign
key attribute of the referencing relation.
62
63. FOREIGN KEY (CONTD.)
It is possible for a foreign key in a relation to
refer to the primary key of the relation itself
An Example:
Employee ( empNo, name, sex, salary, dept, reportsTo)
reportsTo is a foreign key referring to empNo of the
same relation
63