Database overview

DATABASE OVERVIEW
CONTENTS
-What is a database? -Table joins
-Database terminology -Sub queries
-File system and database approach -Indexes
-What is DBMS? -JDBC & components
-Advantages of DBMS
-Types of DBMS
-Data Security
-Relational database
-Sql overview
2

Database: What
 Database
 is collection of related data and its metadata organized in a structured format
 for optimized information management
 Database Management System (DBMS)
 is a software that enables easy creation, access, and modification of databases
 for efficient and effective database management
 Database System
 is an integrated system of hardware, software, people, procedures, and data
 that define and regulate the collection, storage, management, and use of data within a
database environment
3

Database Terminology
Database Concepts
 Database – a collection of related tables
 Tables – a collection of related records
– collection of related entities
 Record – collection of fields (table row)
–represents an entity
 Field – collection of characters (table column)
– represents an attribute
 Character – single alphabetic, numeric or other symbol
Large
Small
4

Traditional File Processing Sucks
File Processing:
 Data is organized, stored, and processed in independent files of data records
5

Problems of
File Processing
Data Redundancy –
duplicate data requires
update to many files
Lack of Integration –
data stored in
separate files
hard to combine data
Data Dependence –
changing the file format requires changing the program…
6

Database Management Approach
Consolidates data records into one CENTRAL
database that can be accessed by many
different application programs.
7

Database Management System9
Database Systems: Design, Implementation, & Management: Rob & Coronel
- manages interaction between end users and database

DATABASE MANAGEMENT SYSTEM
 Database management syatem is a software which organizes, update,manipulate, calculate
and stores our database in a processed way.
 It is a two way process as it accepts queries from the user and , after fetching data from the
database, returns it back to the user.
10

Advantages of DBMS
Data availability
Minimum redundancy
Highly accurate
Multiuser access control
Security control
11

Hierarchical Database
 Background
 Developed to manage large amount of data for complex manufacturing projects
 e.g., Information Management System (IMS)
 IBM-Rockwell joint venture
 clustered related data together
 hierarchically associated data clusters using pointers
 Hierarchical Database Model
 Assumes data relationships are hierarchical
 One-to-Many (1:M) relationships
 Each parent can have many children
 Each child has only one parent
 Logically represented by an upside down tree
12

Hierarchical Database: Example13

Hierarchical Database: Pros & Cons
 Advantages
 Conceptual simplicity
 groups of data could be related to each other
 related data could be viewed together
 Centralization of data
 reduced redundancy and promoted consistency
 Disadvantages
 Limited representation of data relationships
 did not allow Many-to-Many (M:N) relations
 Complex implementation
 required in-depth knowledge of physical data storage
 Structural Dependence
 data access requires physical storage path
 Lack of Standards
 limited portability
14

Network Database
 Objectives
 Represent more complex data relationships
 Improve database performance
 Impose a database standard
 Network Database Model
 Similar to Hierarchical Model
 Records linked by pointers
 Composed of sets
 Each set consists of owner (parent) and member (child)
 Many-to-Many (M:N) relationships representation
 Each owner can have multiple members (1:M)
 A member may have several owners
15

Network Database: Example16

Network Database: Pros & Cons
 Advantages
 More data relationship types
 More efficient and flexible data access
 “network” vs. “tree” path traversal
 Conformance to standards
 enhanced database administration and portability
 Disadvantages
 System complexity
 require familiarity with the internal structure for data access
 Lack of structural independence
 small structural changes require significant program changes
17

Relational Database
 Problems with legacy database systems
 Required excessive effort to maintain
 Data manipulation (programs) too dependent on physical file structure
 Hard to manipulate by end-users
 No capacity for ad-hoc query (must rely on DB programmers).
 Evolution in Data Organization
 E. F. Codd’s Relational Model proposal
 Separated the notion of physical representation (machine-view)
from logical representation (human-view)
 Considered ingenious but computationally impractical in 1970
 Relational Database Model
 Dominant database model of today
 Eliminated pointers and used tables to represent data
 Tables
 flexible logical structure for data representation
 a series of row/column intersections
 related by sharing common entity characteristic(s)
18

Relational Database: Example19
 Provides a logical “human-level” view of the data and associations
among groups of data (i.e., tables)
Customer_ID Customer_Account Agent_ID
1224 4556 23
1225 4558 25
Agent_ID Last_Name First_Name Phone
23 Sturm David 334-5678
25 Long Kyle 556-3421
Customer_ID Last_Name First_Name Phone Account_Balance
1224 Vira Dyne 678-9987 1223.95
1225 Davies Tricia 556-3342 234.25

Relational Database: Pros & Cons
 Advantages
 Structural independence
 Separation of database design and physical data storage/access
 Easier database design, implementation, management, and use
 Ad hoc query capability with Structured Query Language (SQL)
 SQL translates user queries to codes
 Disadvantages
 Substantial hardware and system software overhead
 more complex system
 Poor design and implementation is made easy
 ease-of-use allows careless use of RDBMS
20

Object-Oriented Database
 Semantic Data Model (SDM)
 Modeled both data and their relationships in a single structure (object)
 Developed by Hammer & McLeod in 1981
 Object-oriented concepts became popular in 1990s
 Modularity facilitated program reuse and construction of complex structures
 Ability to handle complex data types (e.g. multimedia data)
 Object-Oriented Database Model (OODBM)
 Maintains the advantages of the ER model but adds more features
 Object = entity + relationships (between & within entity)
 consists of attributes & methods
 attributes describe properties of an object
 methods are all relevant operations that can be performed on an object
 self-contained abstraction of real-world entity
 Class = collection of similar objects with shared attributes and methods
 e.g. EMPLOYEE class = (employ1 object, employ2 object, …)
 organized in a class hierarchy
 e.g. PERSON > EMPLOYEE, CUSTOMER
 Incorporates the notion of inheritance
 attributes and methods of a class are inherited by its descendent classes
21

Object-Oriented Database: Pros & Cons
 Advantages
 Semantic representation of data
 fuller and more meaningful description of data via object
 Modularity, reusability, inheritance
 Ability to handle
 complex data
 sophisticated information requirements
 Disadvantages
 Lack of standards
 no standard data access method
 Complex navigational data access
 class hierarchy traversal
 Steep learning curve
 difficult to design and implement properly
 More system-oriented than user-centered
 High system overhead
 slow transactions
22

Entity Relationship Model
 Peter Chen’s Landmark Paper in 1976
 “The Relationship Model: Toward a Unified View of Data”
 Graphical representation of entities and their relationships
 Entity Relationship (ER) Model
 Based on Entity, Attributes & Relationships
 Entity is a thing about which data are to be collected and stored
 e.g. EMPLOYEE
 Attributes are characteristics of the entity
 e.g. SSN, last name, first name
 Relationships describe an associations between entities
 i.e. 1:M, M:N, 1:1
 Complements the relational data model concepts
 Helps to visualize structure and content of data groups
 entity is mapped to a relational table
 Tool for conceptual data modeling (higher level representation)
 Represented in an Entity Relationship Diagram (ERD)
 Formalizes a way to describe relationships between groups of data
23

E-R Model: Pros & Cons
 Advantages
 Exceptional conceptual simplicity
 easily viewed and understood representation of database
 facilitates database design and management
 Integration with the relational database model
 enables better database design via conceptual modeling
 Disadvantages
 Incomplete model on its own
 Limited representational power
 cannot model data constraints not tied to entity relationships
 e.g. attribute constraints
 cannot represent relationships between attributes within entities
 No data manipulation language (e.g. SQL)
 Loss of information content
 Hard to include attributes in ERD
24

Data Security Concepts
• Database Security
• Data Recovery
• Strategies for Data Recovery

Data Mining
• Data mining, also known as "knowledge discovery," refers to
computer-assisted tools and techniques for sifting through and
analyzing these vast data stores in order to find trends, patterns, and
correlations that can guide decision making and increase
understanding.
•Data mining covers a wide variety of uses, from analyzing customer
purchases to discovering galaxies.
•In essence, data mining is the equivalent of finding gold nuggets in a
mountain of data.

Data Warehousing and Data Marts
•Many organizations now use data warehouses to bring multiple
databases together and make them available for data mining and other
forms of analysis.
•A data warehouse is a collection of data, usually current and historical,
from multiple databases that the organization can use for analysis and
decision making.
•Bringing together so much data into a data warehouse makes analysis
very difficult. To address this problem, organizations use what are called
data marts.
•Data marts are related sets of data that are grouped together and
separated out from the main body of data in the data warehouse.
•Data marts are designed to be made available to specific sets of users.
For example, data about manufacturing can be put into a data mart and
be made available to the production department.

Relational DBMS
where all data are kept in• Relational database management systems, tables or relations
• More flexible & easy to use
• Almost any item of data can be accessed more quickly than the other models
• This is what is referred to as Relational Database Management Systems
(RDBMS)
• Edgar F. Codd at IBM invented the relational database in 1970. Referred to
as RDBMS
Edgar F. Codd

Relational Database Terminology
•An attribute draws its values from a domain
•A domain is the set of possible values for this attribute
•The degree of a relation is the number of its attributes or
columns
•The cardinality of a relation is the number of its tuples or
rows

SQL Overview
•Structured Query Language (SQL) is a high-level language that
allows users to manipulate relational data.
•One of the strengths of SQL is that users need only specify the
information they need without having to know how to retrieve it.
•In other words, SQL is an English-like language with database
specific constructs and keywords that are simple to use and
understand.
•It supports multiple access mechanisms to address different
usages and requirements.
• The database management system is responsible for
developing the access path needed to retrieve the data.
•SQL works at a set level, and is designed to retrieve rows of one
or more tables.

SQL Categories
SQL has three categories based on the functionality
involved:
•DDL – Data definition language used to define, change, or drop
database objects
•DML – Data manipulation language used to read and
modify data
•DCL – Data control language used to grant and revoke
authorizations

Data Types: Date and Time
All databases support various date and time specific data
types and functions. DB2 has the following data types for
date and time.
•Date (YYYY-MM-DD)
•Time (HH:MM:SS)
•Timestamp (YYYY-MM-DD-HH:MM:SS:ssssss)

Creating a table
• CREATE TABLE myTable (col1 integer)
Default Values
CREATE TABLE USERS (NAME CHAR(20),
AGE INTEGER,
PROFESSION VARCHAR(30) with default 'Student')

Constraints
The following example shows a table definition with several
CHECK constraints and a PRIMARY KEY defined:
CREATE TABLE EMPLOYEE
(ID INTEGER NOT NULL PRIMARY KEY, NAME VARCHAR(9),
DEPT SMALLINT CHECK (DEPT BETWEEN 10 AND 100),
JOB CHAR(5) CHECK (JOB IN ('Sales','Mgr','Clerk')), HIREDATE DATE,
SALARY DECIMAL(7,2),
CONSTRAINT YEARSAL CHECK ( YEAR(HIREDATE) > 1986 OR SALARY >
40500 )
)

Deletes and Updates
• There are different rules to
handle deletes and updates and
the behavior depends on the
following constructs used when
defining the tables:
•CASCADE
•SET NULL
•NO ACTION
•RESTRICT
The statement below shows where
the delete and update rules are
specified:
ALTER TABLE DEPENDANT_TABLE
ADD CONSTRAINT constraint_name
FOREIGN KEY column_name
ON DELETE <delete_action_type> ON
UPDATE <update_action_type>

Creating a schema
•To create a schema, use this statement: create schema
mySchema
•To create a table with the above schema, explicitly include it in
the CREATE TABLE statement as follows:
create table mySchema.myTable (col1 integer)
•When the schema is not specified, DB2 uses an implicit
schema, which is typically the user ID used to connect to the
database. To change the implicit schema for the current
session the SET CURRENT SCHEMA command can be used
as follows:
set current schema mySchema

Creating a view
•To create a view based on the EMPLOYEE table we can do as
follows:
CREATE VIEW MYVIEW AS
SELECT LASTNAME, HIREDATE FROM EMPLOYEE
•Once the view is created, you can use it just like any table.
For example, we can issue a simple SELECT statement as
follows:
SELECT * FROM MYVIEW

Creating other database objects
•Modifying database objects
alter table myTable alter column col1 set not null
•Renaming database objects
RENAME <object type> <object name> to <new name>
•To rename a column, the ALTER TABLE SQL statement should be used
in conjunction with RENAME. For example:
ALTER TABLE <table name> RENAME COLUMN
<column name> TO <new name>

Data manipulation with SQL
Selecting data
•Selecting data is performed using the SELECT statement.
•Assuming a table name of myTable, the simplest statement to
select data from this table is:
select * from myTable
•Typically, not all columns of a table are required; in which case,
a selective list of columns should be specified. For example,
select col1, col2 from myTable
•Retrieves col1 and col2 for all rows of the table myTable where col1 and
col2 are the names of the columns to retrieve data from

Ordering the result set
•To guarantee the result set is displayed in the same order all
the time, either in ascending or descending order of a column
or set of columns, use the ORDER BY clause.
•For example this statement returns the result set based on the
order of col1 in ascending order:
SELECT col1 FROM myTable ORDER BY col1 ASC
•ASC stands for ascending, which is the default. Descending
order can be specified using DESC as shown below:
SELECT col1 FROM myTable ORDER BY col1 DESC

Inserting Data
•To insert data into a table use the INSERT statement.
•Statements insert one row at a time into the table myTable.
insert into myTable values (1);
insert into myTable values (1, ‘myName’, ‘2010-01-01’);
•Statements insert multiple (three) rows into the table myTable
:
insert into myTable values (1),(2),(3);
insert into myTable values (1, ‘myName1’,’2010-01-01’), (2, ‘myName2’,’2010-02-01’), (3,
‘myName3’,’2010-03-01’);
•Statement inserts all the rows of the sub-query “select * from
myTable2” into the table myTable.
insert into myTable (select * from myTable2)

Deleting Data
•The DELETE statement is used to delete rows of a table. One
or multiple rows from a table can be deleted with a single
statement by specifying a delete condition using the WHERE
clause. For example, this statement deletes all the rows where
col1 > 1000 in table myTable.
DELETE FROM myTable WHERE col1 > 1000
•Note that care should be taken when issuing a delete
statement. If the WHERE clause is not used, the DELETE
statement will delete all rows from the table.

Updating Data
•Use the UPDATE statement to update data in your table. One
or multiple rows from a table can be updated with a single
statement by specifying the update condition using a WHERE
clause. For each row selected for update, the statement can
update one or more columns.
•For example:
UPDATE myTable SET col1 = -1 WHERE col2 < 0
UPDATE myTable SET col1 = -1, col2 = ‘a’, col3 = ‘2010-01-01’ WHERE col4 = ‘0’
•Note that care should be taken when issuing an update
statement without the WHERE clause. In such cases, all the
rows in the table will be updated.

Table Joins
•A simple SQL select statement is one that selects one or more
columns from any single table.
•The next level of complexity is added when a select statement
has two or more tables as source tables. This leads to multiple
possibilities of how the result set will be generated.
•There are two types of table joins in SQL statements:
Inner join
Outer join
•These are explained in more detail in the following sections.

Inner join
 An inner join is the most common form of joins used in the sql statements.It can be
classified as follows:
 1.Equi join
 2.Natural join
 3.cross join
46

Inner Join: Equi-Join
•This type of join happens when two tables are joined based on
the equality of specified columns; for example:
SELECT *
FROM student, enrollment
WHERE student.enrollment_no=enrollment.enrollment_no
OR
SELECT *
FROM student
INNER JOIN enrollment
ON student.enrollment_no=enrollment.enrollment_no

Inner Join: Natural Join
•A natural join is an improved version of an equi-join where the
joining column does not require specification.
•The system automatically selects the column with same name
in the tables and applies the equality operation on it.
•A natural join will remove all duplicate attributes. Below is an
example.
SELECT *
FROM STUDENT
NATURAL JOIN ENROLLMENT

Inner Join: Cross Join
•A cross join is simply a Cartesian product of the tables to be
joined. For example:
SELECT * FROM STUDENT, ENROLLMENT

Outer Joins
• An outer join is a specialized form of join used in SQL statements.
• In an outer joins, the first table specified in an SQL statement in the FROM
clause is referred as the LEFT table and the remaining table is referred as the
RIGHT table.
• An outer join is of the following three types:
Left outer join
Right outer join
Full outer join

Left outer join
•In a left outer join, the result set is a union of the results of an equi- join,
including any non- matching rows from the LEFT table. For example,
the following statement would return the rows shown in the below figure.
ELECT *S
FROM STUDENT
LEFT OUTER JOIN ENROLLMENT
ON STUDENT.ENROLLMENT_NO = ENROLLMENT_NO

Right outer join
•In a right outer join, the result set is the union of results of an
equi-join, including any non- matching rows from the RIGHT
table. For example, the following statement would return the
rows shown in the below figure.
SELECT *
FROM STUDENT
RIGHT OUTER JOIN ENROLLMENT

Full outer join
•In a full outer join, the result set is the union of results of an
equi- join, including any non- matching rows of the LEFT and
the RIGHT table. For example, the following statement would
return the rows shown in the below Figure.
SELECT *
FROM STUDENT
FULL OUTER JOIN ENROLLMENT

Union, Intersection and Difference
operations

Sample output of Union All operator

Relational Operators
Relational operators are basic tests and operations that can
be performed on data. These operators include:
•Basic mathematical operations like ‘+’, ‘-‘, ‘*’ and ‘/’
•Logical operators like ‘AND’, ‘OR’ and ‘NOT’
•String manipulation operators like ‘CONCATENATE’, ‘LENGTH’,
‘SUBSTRING’
•Comparative operators like ‘=’, ‘<’, ‘>’, ‘>=’, ‘<=’ and ‘!=’
•Grouping and aggregate operators
•Other miscellaneous operations like DISTINCT

Sub-queries
•When a query is applied within a query, the outer query is
referred to as the main query or parent query and the internal
query is referred as the sub-query or inner query.
•This sub query may return a scalar value, single or multiple
tuples, or a NULL data set.
•Sub-queries are executed first, and then the parent query is
executed utilizing data returned by the sub- queries.

Sub-queries returning a scalar value
•Scalar values represent a single value of any attribute or entity, for
example Name, Age, Course, Year, and so on.
•The following query uses a sub-query that returns a scalar value,
which is then used in the parent query to return the required result:
SELECT name FROM students_enrollment
WHERE age = ( SELECT min(age) FROM students )
•The above query returns a list of students who are the youngest
among all students.
•The sub-query “SELECT min(age) FROM students” returns a scalar
value that indicates the minimum age among all students.
•The parent query returns a list of all students whose age is equal to
the value returned by the sub-query.

Sub-queries returning vector values
• When a sub-query returns a data set that represents multiple values for a column
(like a list of names) or array of values for multiple columns (like Name, age and
date of birth for all students), then the sub-query is said to be returning vector
values.
• For example, to get a list of students who are enrolled in courses offered by the
computer science department, we will use the following nested query:
SELECT name FROM students WHERE course_enrolled IN ( SELECT distinct course_name FROM
courses
WHERE department_name = ‘Computer Science’
)
• Here the sub-query returns a list of all courses that are offered in the “Computer
Science” department and the outer query lists all students enrolled in the courses of
the sub-query result set.

Co-related sub-query
•When a sub-query is executed for each row of the parent table, instead
of once (as shown in the examples above) then the sub-query is
referred to as a correlated sub-query. For example:
SELECT dept, name, marks
FROM final_result a WHERE marks = (
SELECT max(marks) FROM final_result WHERE dept = a.dept
)
• The above statement searches for a list of students with their
departments, who have been awarded maximum marks in each
department.
•For each row on the LEFT table, the sub- query finds max(marks) for
the department of the current row and if the values of marks in the
current row is equal to the sub-query result set, then it is added to the
outer query result set.

Sub-query in FROM Clauses
•A sub-query can be used in a FROM clause as well, as shown in
following example:
SELECT dept, max_marks, min_marks, avg_marks FROM
(
SELECT dept,
max(marks) as max_marks, min(marks) as min_marks, avg(marks) as avg_marks FROM
final_result GROUP BY dept
)
WHERE (max_marks – min_marks) > 50 and avg_marks < 50
•The above query uses a sub-query in the FROM clause.
•The sub-query returns maximum, minimum and average marks for
each department.
•The outer query uses this data and filters the data further by adding
filter conditions in the WHERE clause of the outer query.

Mapping of object-oriented concepts to relational
concepts

JDBC
•JDBC stands for Java Database Connectivity, which is a standard Java
API for database-independent connectivity between the Java
programming language and a wide range of databases.
JDBC Architecture

Common JDBC components
The JDBC API provides the following interfaces and classes:
• Driver Manager
• Driver
• Connection
• Statement
• Result Set
• SQL Exception

Database APIs
•Database APIs are totally dynamic in nature and can be
developed independently of the database software being
used, and without the need for pre-compilation.
•Database APIs, as the name suggests, are a set of APIs
exposed by database vendors pertaining to different
programming languages like C, C++, Java and so on, which
gives application developers a mechanism to interact with the
database from within the application by just calling these SQL
callable interfaces.

ODBC and the IBM Data
Server
CLI driver
called Open Database
systems, based on a
• Microsoft developed a callable SQL interface
Connectivity (ODBC) for Microsoft operating
preliminary draft of X/Open CLI.
• However, ODBC is no longer limited to Microsoft operating systems and
currently many implementations are available on other platforms as well.
• The IBM Data Server CLI driver is the DB2 Call level Interface which is based
on the Microsoft® ODBC specifications, and the International Standard for
SQL/CLI.
• These specifications were chosen as the basis for the DB2 Call Level
Interface in an effort to follow industry standards and to provide a shorter
learning curve for those application programmers already familiar with either
of these database interfaces.

Indexes concept (1 of 3)
Types of Indexes
•Different types of indexes can be created for different purposes and
performance benefits.
Indexes on partitioned tables
•Partitioned tables can have indexes that are nonpartitioned (existing in
a single table space within a database partition), indexes that are
themselves partitioned across one or more table spaces within a
database partition, or a combination of the two.
Designing indexes
•Indexes are typically used to speed up access to a table. However,
they can also serve a logical data design purpose.

Indexes concept (2 of 3)
Creating indexes
•Indexes can be created for many reasons, including: to allow queries to
run more efficiently; to order the rows of a table in ascending or
descending sequence according to the values in a column; to enforce
constraints such as uniqueness on index keys. You can use the CREATE
INDEX statement, the DB2® Design Advisor,or the db2advis Design
Advisor command to create the indexes.
Modifying indexes
•If you want to modify your index, other than using the ALTER INDEX
statement to enable or disable index compression, you must drop the
index first and then create the index again.
Dropping indexes
•To delete an index, use the DROP statement.

Clustered and Non-clustered indexes
•Index architectures are classified as clustered or non-
clustered. Clustered indexes are indexes whose order of the
rows in the data pages corresponds to the order of the rows in
the index.
•This order is why only one clustered index can exist in any
table, whereas, many non-clustered indexes can exist in the
table. In some database systems, the leaf node of the
clustered index corresponds to the actual data, not a pointer to
data that is found elsewhere.
•Both clustered and non-clustered indexes contain only keys
and record identifiers in the index structure.

IBM ICE (Innovation Centre forEducation)
Failure management with DB2
cluster services
•DB2 cluster services is software that provides automatic heartbeat
failure detection and automatically initiates the necessary recovery
operations after a failure is detected.
•It also provides the cluster file system that gives each host in a DB2
pureScale® instance access to a common file system.
•DB2 cluster services includes technology from IBM® Tivoli® System
Automation for Multiplatforms (Tivoli SA MP) software, IBM Reliable
Scalable Clustering Technology (RSCT) software, and IBM General
Parallel File System (GPFS™) software.
•This technology is packaged as an integral part of the DB2 pureScale
Feature.

Database overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Database overview

Similar to Database overview (20)

Recently uploaded

Recently uploaded (20)

Database overview

Editor's Notes