Chapter 2: Database System Concepts
and Architecture
• 2.1 Data Models, Schemas, and
Instances
• 2.2 DBMS Architecture and Data
Independence
• 2.3 Database Languages and Interfaces
• 2.4 The Database System Environment
• 2.5 Classification of Database
Management Systems
Chapter 2: Database System Concepts
and Architecture
2.1 Data Models, Schemas, and
Instances
• One fundamental characteristic of the database
approach is that it provides some level of data
abstraction by hiding details of data storage that
are not needed by most database users.
• A data model—a collection of concepts that can
be used to describe the structure of a
database—provides the necessary means to
achieve this abstraction.
• Data Model Operations: Operations for
specifying database retrievals and updates
by referring to the concepts of the data
model.
‧generic operation: insert, delete, modify,
retrieve
‧user-defined operations: comput_GPA
2.1.1 Categories of Data Models
a) High-level or conceptual data
models provide concepts that are
close to the way many users perceive
data.
– Conceptual data models use concepts such
as entities, attributes, and relationships.
– An entity represents a real-world object or
concept, such as an employee or a project,
that is described in the database.
– An attribute represents some property of
interest that further describes an entity, such
as the employee’s name or salary.
– A relationship among two or more entities
represents an interaction among the entities;
for example, a works-on relationship between
an employee and a project.
b) low-level or physical data models
provide concepts that describe the
details of how data is stored in the
computer.
• Concepts provided by low-level data models
re generally meant for computer specialists,
not for typical end users
c) Implementation (record-oriented) data
models: Provide concepts that fall
between the above two, balancing user
views with some computer storage
details.
• ‧relational ‧network ‧hierarchical
2.1.2 Schemas, Instances, and Database State
• In any data model it is important to distinguish between
the description of the database and the database itself.
• Database Schema (meta-data): The description of a
database. Includes descriptions of the database
structure and the constraints that should hold on the
database.
• Schema Diagram: A diagrammatic display of (some
aspects of ) a database schema. (refer to Fig 2.1 2-5)
• A schema diagram displays only some aspects of a
schema, such as the names of record types and data
items, and some types of constraints
• Database Instance: The actual data stored in a
database at a particular moment in time. Also called
database state ( or occurrence, snapshot)
(refer to Fig 1.2 2-6)
• The database schema changes very infrequently. The
database state changes every time the database is
updated. Schema is also called intension, whereas
state is called extension.
• The distinction between database schema and
database state is very important.
• When we define a new database, we specify its
database schema only to the DBMS. At this point, the
corresponding database state is the empty state with no
data. We get the initial state of the database when the
database is first populated or loaded with the initial data.
From then on, every time an update operation is applied
to the database, we get another database state.
• The DBMS is partly responsible for ensuring that every
state of the database is a valid state—that is, a state
that satisfies the structure and constraints specified in
the schema.
define
empty state
initial state
load
state
update
update
valid state
satisfy database schema
Schema evolution: when there is sometimes a change in schema
diagram (DateOfBirth for a student object)
Figure 2.1 Schema diagram for UNIVERSITY database
schema construct
Known data:
name of record types, data items
Figure 1.2
UNIVERSITY Database
2.2 DBMS Architecture and Data Independence
1. The internal level has an internal schema, which describes the
physical storage structure of the database. The internal schema
uses a physical data model and describes the complete details of
data storage and access paths for the database.
2. The conceptual level has a conceptual schema, which
describes the structure of the whole database for a community of
users. The conceptual schema hides the details of physical
storage structures and concentrates on describing entities, data
types, relationships, user operations, and constraints. A high-level
data model or an implementation data model can be used at this
level.
3. The external or view level includes a number of external
schemas or user views. Each external schema describes the
part of the database that a particular user group is interested in
and hides the rest of the database from that user group. A high-
level data model or an implementation data model can be used at
this level.
2.2.1 The Three-Schema Architecture
2.2 DBMS Architecture and Data Independence
Proposed to support DBMS characteristics of:
- Insulation of programs and data/program and operations
(program-data and program-operation independence)
- Support of multiple views of the data.
- Use of catalog (database description)
Defines DBMS schema at three levels: (see 2-9)
- Internal schema at the internal level to describe data storage structures and access
paths. Typically uses a physical data model.
- Conceptual schema at the conceptual level to describe the structure and constraints
for the whole database. Uses a conceptual or an implementation data model.
- External schema at the external level to describe the various user views. Usually
uses the same data model as the conceptual level or high-level data model.
Mappings among schema levels are also needed. Programs refer to an external schema,
and are mapped by the DBMS to the internal schema for execution
2.2.1 Three-Schema Architecture
Figure 2.2 The Three-schema architecture 2-6
2.2.2 Data Independence
Logical Data Independence: The capacity to change the conceptual schema without
having to change the external schemas and their application programs.
Physical Data Independence: The capacity to change the internal schema without
having to change the conceptual schema.
When a schema at a lower level is changed, only the mappings between this
schema and higher-lever schemas need to be changed in a DBMS that fully supports
data independence. The higher-level schemas themselves are unchanged. Hence, the
application programs need not be changed since they refer to the external schemas.
By adding or removing a record type or data
item to
· expand the database
· reduce the database
Reorganize physical files to improve performance
e.g. List all sections offered in Fall 1998
Disadvantages of two levels of mappings:
Overhead during compilation or execution of a query or program
• UNIVERSITY Conceptual Schema
STUDENT (Name, Student Number, Class, Major)
COURSE (Course Name, Course Number, Credit, Dept)
PREREQUISITE (Course Number, Prerequisite Number)
SECTION (Section Id, Course Number, Semester, Year,
Instructor)
GRADE_REPORT(Student Number, Section Id , Grade)
•
UNIVERSITY External Schema
TRANSCRIPT(Student Name, Course Number, Grade,
Semester, Year, Section Id)
derived from STUDENT, SECTION, GRADE_REPORT
• PREREQUISITES(Course Name, Course Number,
Prerequisites)
derived from PREREQUISITE, COURSE
• Change GRADE-REPORT Schema Construct
GRADE_REPORT (Student Number, Student Name, Section
Id, Course Number, Grade)
• Change Mapping (& View Definition)
TRANSCRIPT derived from SECTION, GRADE_REPORT
2.3 Database Languages and Interfaces
Data Definition Language (DDL): Used by the DBA and database designers to
specify the conceptual schema of a database. In many DBMSs, the DDL is also
used to define internal and external schemas (views). In some DBMSs, separate
storage definition language (SDL) and view definition language (VDL) are
used to define internal and external schemas.
Data Manipulation Language (DML): Used to specify database retrievals and
updates (insertion, deletion, modifications)
- DML commands (data sublanguage) can be embedded in a general-purpose
programming language (host language).
- Alternatively, stand-alone DML commands can be applied directly (query
language).
provide appropriate languages and interfaces for each category of users.
2.3.1 DBMS Languages
DDL Compiler
2.3 Database Languages and Interfaces
Data Definition Language (DDL): Used by the DBA and database designers to
specify the conceptual schema of a database. In many DBMSs, the DDL is also
used to define internal and external schemas (views). In some DBMSs, separate
storage definition language (SDL) and view definition language (VDL) are
used to define internal and external schemas.
SDL specify the Internal Schema. The mapping between the conceptual and
internal schema is done in one of the two.
VDL specify user views and their mappings to the conceptual schema, but in
most DBMSs the DDL is used to define both conceptual and external
schemas.
Data) can be embedded in a general-purpose programming language (host
language).
Manipulation Language (DML): Used to specify database retrievals and
updates (insertion, deletion, modifications)
DML commands (data sublanguage - Alternatively, stand-alone DML commands
can be applied directly (query language).
provide appropriate languages and interfaces for each category of users.
2.3.1 DBMS Languages
DDL Compiler
Types of DML
-Procedural DML:
• Also called record-at-a-time (record-oriented) or low-level DML
• Must be embedded in a programming language.
• Searches for and retrieves individual database records and uses looping
and other constructs of the host programming language to retrieve multiple
records.
-Declarative or non-procedural DML:
• Also called set-at-a-time (set-oriented) or high-level DML.
• Can be used as a stand-alone query language or can be embedded in a
programming language.
• Searches for and retrieves information from multiple related database
records in a single command.
- host language: general-purpose language
- data sublanguage: DML
- C++
2.3.2 DBMS Interfaces
- Stand-alone query language interfaces. (casual end user)
- Programmer interfaces for embedding DML in programming
languages: (programmer)
-Pre-compiler Approach
-Procedure (Subroutine) Call Approach
- User-friendly interfaces:
-Menu-based Interfaces for Browsing.
-Forms-based Interfaces.
-Graphical User Interfaces.
-Natural language Interfaces
-Combination of the above
-Interfaces for Parametic Users (using function keys)
-Parametric users, such as bank tellers, often have a small set of
operations that they must perform repeatedly.
- Interfaces for the DBA:
-Creating accounts, granting authorizations
-Setting system parameters
-Changing schemas or access path
• These interfaces present the user with lists of options, called menus, that
lead the user through the formulation of a request.
• Avoid memorizing specific commands and query language rather query is
composed step by step by pick up options from a menu that is display by
the system.
2.3.2 DBMS Interfaces
Menu based Interface
Form based Interface
A forms-based interface displays a form to each user. Users can fill out all of
the form entries to insert new data, or they fill out only certain entries, in which
case the DBMS will retrieve matching data for the remaining entries. Forms
are usually designed and programmed for naive users as interfaces to
canned transactions.
•The database and the DBMS catalog are usually stored on disk.
• Access to the disk is controlled primarily by the operating system (OS), which
schedules disk input/output.
•A stored data manager module of the DBMS controls access to DBMS
information that is stored on disk(database or catalog).
•It uses the OS service for carrying out low-level data transfer between the disk
and main storage
•DDL compiler processes schema definitions, specified in the DDL, and stores
descriptions of the schemas (meta-data) in the DBMS catalog.
•run-time database processor handles database accesses at run time; it
receives retrieval or update operations and carries them out on the database.
•query compiler handles high-level queries that are entered interactively. It
parses, analyzes, and compiles or interprets a query
•pre-compiler extracts DML commands from an application program written in a
host programming language.
2.4 The Database System Environment
2.4.1 DBMS Component Modules
2.4.1 DBMS Component Modules
2.4 The Database System Environment
Figure 2.3 A typical component modules of a DBMS. The dotted line show accesses
that are under the control of the stored data manager.
2.4.2 Database System Utilities
DBMSs database utilities that help the DBA in managing the database system.
Common utilities have the following types of functions:
Loading is used to load existing data files—such as text files or sequential
files—into the database.
Backing up :creates a backup copy of the database
- File reorganizing : used to reorganize a database file into a different file
organization to improve performance.
- Report generation utilities.
- Performance monitoring utilities: monitors database usage and
provides statistics to the DBA. The DBA uses the statistics to make a
decision
- Other functions, such as sorting, user monitoring,
data compression, etc.
2.5 Classification of Database Management Systems
Several criteria are normally used to classify DBMSs.
Based on the data model used:
•Data models
-Traditional: Relational, Network (see 2-19), Hierarchical
- Emerging: Object-oriented, Semantic, Entity- Relationship, other.
Other classifications:
•Number of users : Single-user (typically used with personal computers) vs.
multi-user (most DBMSs)
•Number of sites:
Centralized (uses a single computer) vs. distributed (uses multiple computers).
Homogeneous vs. Heterogeneous
• Cost of DBMS software. $10,000~100,000
$100~3,000
•Types of access paths used. (inverted file structures, …)
•Purpose general purpose
special purpose
e.g. airline reservations, telephone directory, on-line transaction
processing system
Relational Data Model
• represents a database as a collection of tables, where each table can be
stored as a separate file.
• use the high-level query language called SQL and support a limited form of
user views
Object Data Model
• defines a database in terms of objects, their properties, and their operations
• Objects with the same structure and behavior belong to a class, and classes
are organized into hierarchies (or acyclic graphs).
Object-relational model:
• relational database extend to incorporate object database concepts and other
capabilities.
Network model
• represents data as record types and also represents a limited type of 1:N
relationship, called a set type.
hierarchical model
• represents data as hierarchical tree structures. Each hierarchy represents a
number of related records
Chapter 2.pdfChapter 2.pdfChapter 2.pdfChapter 2.pdf

Chapter 2.pdfChapter 2.pdfChapter 2.pdfChapter 2.pdf

  • 1.
    Chapter 2: DatabaseSystem Concepts and Architecture • 2.1 Data Models, Schemas, and Instances • 2.2 DBMS Architecture and Data Independence • 2.3 Database Languages and Interfaces • 2.4 The Database System Environment • 2.5 Classification of Database Management Systems Chapter 2: Database System Concepts and Architecture
  • 2.
    2.1 Data Models,Schemas, and Instances • One fundamental characteristic of the database approach is that it provides some level of data abstraction by hiding details of data storage that are not needed by most database users. • A data model—a collection of concepts that can be used to describe the structure of a database—provides the necessary means to achieve this abstraction.
  • 3.
    • Data ModelOperations: Operations for specifying database retrievals and updates by referring to the concepts of the data model. ‧generic operation: insert, delete, modify, retrieve ‧user-defined operations: comput_GPA
  • 4.
    2.1.1 Categories ofData Models a) High-level or conceptual data models provide concepts that are close to the way many users perceive data. – Conceptual data models use concepts such as entities, attributes, and relationships.
  • 5.
    – An entityrepresents a real-world object or concept, such as an employee or a project, that is described in the database. – An attribute represents some property of interest that further describes an entity, such as the employee’s name or salary. – A relationship among two or more entities represents an interaction among the entities; for example, a works-on relationship between an employee and a project.
  • 6.
    b) low-level orphysical data models provide concepts that describe the details of how data is stored in the computer. • Concepts provided by low-level data models re generally meant for computer specialists, not for typical end users c) Implementation (record-oriented) data models: Provide concepts that fall between the above two, balancing user views with some computer storage details. • ‧relational ‧network ‧hierarchical
  • 7.
    2.1.2 Schemas, Instances,and Database State • In any data model it is important to distinguish between the description of the database and the database itself. • Database Schema (meta-data): The description of a database. Includes descriptions of the database structure and the constraints that should hold on the database. • Schema Diagram: A diagrammatic display of (some aspects of ) a database schema. (refer to Fig 2.1 2-5) • A schema diagram displays only some aspects of a schema, such as the names of record types and data items, and some types of constraints • Database Instance: The actual data stored in a database at a particular moment in time. Also called database state ( or occurrence, snapshot) (refer to Fig 1.2 2-6)
  • 8.
    • The databaseschema changes very infrequently. The database state changes every time the database is updated. Schema is also called intension, whereas state is called extension. • The distinction between database schema and database state is very important. • When we define a new database, we specify its database schema only to the DBMS. At this point, the corresponding database state is the empty state with no data. We get the initial state of the database when the database is first populated or loaded with the initial data. From then on, every time an update operation is applied to the database, we get another database state. • The DBMS is partly responsible for ensuring that every state of the database is a valid state—that is, a state that satisfies the structure and constraints specified in the schema.
  • 9.
    define empty state initial state load state update update validstate satisfy database schema Schema evolution: when there is sometimes a change in schema diagram (DateOfBirth for a student object)
  • 10.
    Figure 2.1 Schemadiagram for UNIVERSITY database schema construct Known data: name of record types, data items
  • 11.
  • 12.
    2.2 DBMS Architectureand Data Independence 1. The internal level has an internal schema, which describes the physical storage structure of the database. The internal schema uses a physical data model and describes the complete details of data storage and access paths for the database. 2. The conceptual level has a conceptual schema, which describes the structure of the whole database for a community of users. The conceptual schema hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints. A high-level data model or an implementation data model can be used at this level. 3. The external or view level includes a number of external schemas or user views. Each external schema describes the part of the database that a particular user group is interested in and hides the rest of the database from that user group. A high- level data model or an implementation data model can be used at this level. 2.2.1 The Three-Schema Architecture
  • 13.
    2.2 DBMS Architectureand Data Independence Proposed to support DBMS characteristics of: - Insulation of programs and data/program and operations (program-data and program-operation independence) - Support of multiple views of the data. - Use of catalog (database description) Defines DBMS schema at three levels: (see 2-9) - Internal schema at the internal level to describe data storage structures and access paths. Typically uses a physical data model. - Conceptual schema at the conceptual level to describe the structure and constraints for the whole database. Uses a conceptual or an implementation data model. - External schema at the external level to describe the various user views. Usually uses the same data model as the conceptual level or high-level data model. Mappings among schema levels are also needed. Programs refer to an external schema, and are mapped by the DBMS to the internal schema for execution 2.2.1 Three-Schema Architecture
  • 14.
    Figure 2.2 TheThree-schema architecture 2-6
  • 15.
    2.2.2 Data Independence LogicalData Independence: The capacity to change the conceptual schema without having to change the external schemas and their application programs. Physical Data Independence: The capacity to change the internal schema without having to change the conceptual schema. When a schema at a lower level is changed, only the mappings between this schema and higher-lever schemas need to be changed in a DBMS that fully supports data independence. The higher-level schemas themselves are unchanged. Hence, the application programs need not be changed since they refer to the external schemas. By adding or removing a record type or data item to · expand the database · reduce the database Reorganize physical files to improve performance e.g. List all sections offered in Fall 1998 Disadvantages of two levels of mappings: Overhead during compilation or execution of a query or program
  • 16.
    • UNIVERSITY ConceptualSchema STUDENT (Name, Student Number, Class, Major) COURSE (Course Name, Course Number, Credit, Dept) PREREQUISITE (Course Number, Prerequisite Number) SECTION (Section Id, Course Number, Semester, Year, Instructor) GRADE_REPORT(Student Number, Section Id , Grade) • UNIVERSITY External Schema TRANSCRIPT(Student Name, Course Number, Grade, Semester, Year, Section Id) derived from STUDENT, SECTION, GRADE_REPORT • PREREQUISITES(Course Name, Course Number, Prerequisites) derived from PREREQUISITE, COURSE • Change GRADE-REPORT Schema Construct GRADE_REPORT (Student Number, Student Name, Section Id, Course Number, Grade) • Change Mapping (& View Definition) TRANSCRIPT derived from SECTION, GRADE_REPORT
  • 17.
    2.3 Database Languagesand Interfaces Data Definition Language (DDL): Used by the DBA and database designers to specify the conceptual schema of a database. In many DBMSs, the DDL is also used to define internal and external schemas (views). In some DBMSs, separate storage definition language (SDL) and view definition language (VDL) are used to define internal and external schemas. Data Manipulation Language (DML): Used to specify database retrievals and updates (insertion, deletion, modifications) - DML commands (data sublanguage) can be embedded in a general-purpose programming language (host language). - Alternatively, stand-alone DML commands can be applied directly (query language). provide appropriate languages and interfaces for each category of users. 2.3.1 DBMS Languages DDL Compiler
  • 18.
    2.3 Database Languagesand Interfaces Data Definition Language (DDL): Used by the DBA and database designers to specify the conceptual schema of a database. In many DBMSs, the DDL is also used to define internal and external schemas (views). In some DBMSs, separate storage definition language (SDL) and view definition language (VDL) are used to define internal and external schemas. SDL specify the Internal Schema. The mapping between the conceptual and internal schema is done in one of the two. VDL specify user views and their mappings to the conceptual schema, but in most DBMSs the DDL is used to define both conceptual and external schemas. Data) can be embedded in a general-purpose programming language (host language). Manipulation Language (DML): Used to specify database retrievals and updates (insertion, deletion, modifications) DML commands (data sublanguage - Alternatively, stand-alone DML commands can be applied directly (query language). provide appropriate languages and interfaces for each category of users. 2.3.1 DBMS Languages DDL Compiler
  • 19.
    Types of DML -ProceduralDML: • Also called record-at-a-time (record-oriented) or low-level DML • Must be embedded in a programming language. • Searches for and retrieves individual database records and uses looping and other constructs of the host programming language to retrieve multiple records. -Declarative or non-procedural DML: • Also called set-at-a-time (set-oriented) or high-level DML. • Can be used as a stand-alone query language or can be embedded in a programming language. • Searches for and retrieves information from multiple related database records in a single command. - host language: general-purpose language - data sublanguage: DML - C++
  • 20.
    2.3.2 DBMS Interfaces -Stand-alone query language interfaces. (casual end user) - Programmer interfaces for embedding DML in programming languages: (programmer) -Pre-compiler Approach -Procedure (Subroutine) Call Approach - User-friendly interfaces: -Menu-based Interfaces for Browsing. -Forms-based Interfaces. -Graphical User Interfaces. -Natural language Interfaces -Combination of the above -Interfaces for Parametic Users (using function keys) -Parametric users, such as bank tellers, often have a small set of operations that they must perform repeatedly. - Interfaces for the DBA: -Creating accounts, granting authorizations -Setting system parameters -Changing schemas or access path
  • 21.
    • These interfacespresent the user with lists of options, called menus, that lead the user through the formulation of a request. • Avoid memorizing specific commands and query language rather query is composed step by step by pick up options from a menu that is display by the system. 2.3.2 DBMS Interfaces Menu based Interface Form based Interface A forms-based interface displays a form to each user. Users can fill out all of the form entries to insert new data, or they fill out only certain entries, in which case the DBMS will retrieve matching data for the remaining entries. Forms are usually designed and programmed for naive users as interfaces to canned transactions.
  • 22.
    •The database andthe DBMS catalog are usually stored on disk. • Access to the disk is controlled primarily by the operating system (OS), which schedules disk input/output. •A stored data manager module of the DBMS controls access to DBMS information that is stored on disk(database or catalog). •It uses the OS service for carrying out low-level data transfer between the disk and main storage •DDL compiler processes schema definitions, specified in the DDL, and stores descriptions of the schemas (meta-data) in the DBMS catalog. •run-time database processor handles database accesses at run time; it receives retrieval or update operations and carries them out on the database. •query compiler handles high-level queries that are entered interactively. It parses, analyzes, and compiles or interprets a query •pre-compiler extracts DML commands from an application program written in a host programming language. 2.4 The Database System Environment 2.4.1 DBMS Component Modules
  • 23.
    2.4.1 DBMS ComponentModules 2.4 The Database System Environment Figure 2.3 A typical component modules of a DBMS. The dotted line show accesses that are under the control of the stored data manager.
  • 24.
    2.4.2 Database SystemUtilities DBMSs database utilities that help the DBA in managing the database system. Common utilities have the following types of functions: Loading is used to load existing data files—such as text files or sequential files—into the database. Backing up :creates a backup copy of the database - File reorganizing : used to reorganize a database file into a different file organization to improve performance. - Report generation utilities. - Performance monitoring utilities: monitors database usage and provides statistics to the DBA. The DBA uses the statistics to make a decision - Other functions, such as sorting, user monitoring, data compression, etc.
  • 25.
    2.5 Classification ofDatabase Management Systems Several criteria are normally used to classify DBMSs. Based on the data model used: •Data models -Traditional: Relational, Network (see 2-19), Hierarchical - Emerging: Object-oriented, Semantic, Entity- Relationship, other. Other classifications: •Number of users : Single-user (typically used with personal computers) vs. multi-user (most DBMSs) •Number of sites: Centralized (uses a single computer) vs. distributed (uses multiple computers). Homogeneous vs. Heterogeneous • Cost of DBMS software. $10,000~100,000 $100~3,000 •Types of access paths used. (inverted file structures, …) •Purpose general purpose special purpose e.g. airline reservations, telephone directory, on-line transaction processing system
  • 26.
    Relational Data Model •represents a database as a collection of tables, where each table can be stored as a separate file. • use the high-level query language called SQL and support a limited form of user views Object Data Model • defines a database in terms of objects, their properties, and their operations • Objects with the same structure and behavior belong to a class, and classes are organized into hierarchies (or acyclic graphs). Object-relational model: • relational database extend to incorporate object database concepts and other capabilities. Network model • represents data as record types and also represents a limited type of 1:N relationship, called a set type. hierarchical model • represents data as hierarchical tree structures. Each hierarchy represents a number of related records