2. DATABASE OVERVIEW
CONTENTS
-What is a database? -Table joins
-Database terminology -Sub queries
-File system and database approach -Indexes
-What is DBMS? -JDBC & components
-Advantages of DBMS
-Types of DBMS
-Data Security
-Relational database
-Sql overview
2
3. Database: What
Database
is collection of related data and its metadata organized in a structured format
for optimized information management
Database Management System (DBMS)
is a software that enables easy creation, access, and modification of databases
for efficient and effective database management
Database System
is an integrated system of hardware, software, people, procedures, and data
that define and regulate the collection, storage, management, and use of data within a
database environment
3
4. Database Terminology
Database Concepts
Database – a collection of related tables
Tables – a collection of related records
– collection of related entities
Record – collection of fields (table row)
–represents an entity
Field – collection of characters (table column)
– represents an attribute
Character – single alphabetic, numeric or other symbol
Large
Small
4
5. Traditional File Processing Sucks
File Processing:
Data is organized, stored, and processed in independent files of data records
5
6. Problems of
File Processing
Data Redundancy –
duplicate data requires
update to many files
Lack of Integration –
data stored in
separate files
hard to combine data
Data Dependence –
changing the file format requires changing the program…
6
9. Database Management System9
Database Systems: Design, Implementation, & Management: Rob & Coronel
- manages interaction between end users and database
10. DATABASE MANAGEMENT SYSTEM
Database management syatem is a software which organizes, update,manipulate, calculate
and stores our database in a processed way.
It is a two way process as it accepts queries from the user and , after fetching data from the
database, returns it back to the user.
10
11. Advantages of DBMS
Data availability
Minimum redundancy
Highly accurate
Multiuser access control
Security control
11
12. Hierarchical Database
Background
Developed to manage large amount of data for complex manufacturing projects
e.g., Information Management System (IMS)
IBM-Rockwell joint venture
clustered related data together
hierarchically associated data clusters using pointers
Hierarchical Database Model
Assumes data relationships are hierarchical
One-to-Many (1:M) relationships
Each parent can have many children
Each child has only one parent
Logically represented by an upside down tree
12
14. Hierarchical Database: Pros & Cons
Advantages
Conceptual simplicity
groups of data could be related to each other
related data could be viewed together
Centralization of data
reduced redundancy and promoted consistency
Disadvantages
Limited representation of data relationships
did not allow Many-to-Many (M:N) relations
Complex implementation
required in-depth knowledge of physical data storage
Structural Dependence
data access requires physical storage path
Lack of Standards
limited portability
14
15. Network Database
Objectives
Represent more complex data relationships
Improve database performance
Impose a database standard
Network Database Model
Similar to Hierarchical Model
Records linked by pointers
Composed of sets
Each set consists of owner (parent) and member (child)
Many-to-Many (M:N) relationships representation
Each owner can have multiple members (1:M)
A member may have several owners
15
17. Network Database: Pros & Cons
Advantages
More data relationship types
More efficient and flexible data access
“network” vs. “tree” path traversal
Conformance to standards
enhanced database administration and portability
Disadvantages
System complexity
require familiarity with the internal structure for data access
Lack of structural independence
small structural changes require significant program changes
17
18. Relational Database
Problems with legacy database systems
Required excessive effort to maintain
Data manipulation (programs) too dependent on physical file structure
Hard to manipulate by end-users
No capacity for ad-hoc query (must rely on DB programmers).
Evolution in Data Organization
E. F. Codd’s Relational Model proposal
Separated the notion of physical representation (machine-view)
from logical representation (human-view)
Considered ingenious but computationally impractical in 1970
Relational Database Model
Dominant database model of today
Eliminated pointers and used tables to represent data
Tables
flexible logical structure for data representation
a series of row/column intersections
related by sharing common entity characteristic(s)
18
19. Relational Database: Example19
Provides a logical “human-level” view of the data and associations
among groups of data (i.e., tables)
Customer_ID Customer_Account Agent_ID
1224 4556 23
1225 4558 25
Agent_ID Last_Name First_Name Phone
23 Sturm David 334-5678
25 Long Kyle 556-3421
Customer_ID Last_Name First_Name Phone Account_Balance
1224 Vira Dyne 678-9987 1223.95
1225 Davies Tricia 556-3342 234.25
20. Relational Database: Pros & Cons
Advantages
Structural independence
Separation of database design and physical data storage/access
Easier database design, implementation, management, and use
Ad hoc query capability with Structured Query Language (SQL)
SQL translates user queries to codes
Disadvantages
Substantial hardware and system software overhead
more complex system
Poor design and implementation is made easy
ease-of-use allows careless use of RDBMS
20
21. Object-Oriented Database
Semantic Data Model (SDM)
Modeled both data and their relationships in a single structure (object)
Developed by Hammer & McLeod in 1981
Object-oriented concepts became popular in 1990s
Modularity facilitated program reuse and construction of complex structures
Ability to handle complex data types (e.g. multimedia data)
Object-Oriented Database Model (OODBM)
Maintains the advantages of the ER model but adds more features
Object = entity + relationships (between & within entity)
consists of attributes & methods
attributes describe properties of an object
methods are all relevant operations that can be performed on an object
self-contained abstraction of real-world entity
Class = collection of similar objects with shared attributes and methods
e.g. EMPLOYEE class = (employ1 object, employ2 object, …)
organized in a class hierarchy
e.g. PERSON > EMPLOYEE, CUSTOMER
Incorporates the notion of inheritance
attributes and methods of a class are inherited by its descendent classes
21
22. Object-Oriented Database: Pros & Cons
Advantages
Semantic representation of data
fuller and more meaningful description of data via object
Modularity, reusability, inheritance
Ability to handle
complex data
sophisticated information requirements
Disadvantages
Lack of standards
no standard data access method
Complex navigational data access
class hierarchy traversal
Steep learning curve
difficult to design and implement properly
More system-oriented than user-centered
High system overhead
slow transactions
22
23. Entity Relationship Model
Peter Chen’s Landmark Paper in 1976
“The Relationship Model: Toward a Unified View of Data”
Graphical representation of entities and their relationships
Entity Relationship (ER) Model
Based on Entity, Attributes & Relationships
Entity is a thing about which data are to be collected and stored
e.g. EMPLOYEE
Attributes are characteristics of the entity
e.g. SSN, last name, first name
Relationships describe an associations between entities
i.e. 1:M, M:N, 1:1
Complements the relational data model concepts
Helps to visualize structure and content of data groups
entity is mapped to a relational table
Tool for conceptual data modeling (higher level representation)
Represented in an Entity Relationship Diagram (ERD)
Formalizes a way to describe relationships between groups of data
23
24. E-R Model: Pros & Cons
Advantages
Exceptional conceptual simplicity
easily viewed and understood representation of database
facilitates database design and management
Integration with the relational database model
enables better database design via conceptual modeling
Disadvantages
Incomplete model on its own
Limited representational power
cannot model data constraints not tied to entity relationships
e.g. attribute constraints
cannot represent relationships between attributes within entities
No data manipulation language (e.g. SQL)
Loss of information content
Hard to include attributes in ERD
24
26. Data Mining
• Data mining, also known as "knowledge discovery," refers to
computer-assisted tools and techniques for sifting through and
analyzing these vast data stores in order to find trends, patterns, and
correlations that can guide decision making and increase
understanding.
•Data mining covers a wide variety of uses, from analyzing customer
purchases to discovering galaxies.
•In essence, data mining is the equivalent of finding gold nuggets in a
mountain of data.
27. Data Warehousing and Data Marts
•Many organizations now use data warehouses to bring multiple
databases together and make them available for data mining and other
forms of analysis.
•A data warehouse is a collection of data, usually current and historical,
from multiple databases that the organization can use for analysis and
decision making.
•Bringing together so much data into a data warehouse makes analysis
very difficult. To address this problem, organizations use what are called
data marts.
•Data marts are related sets of data that are grouped together and
separated out from the main body of data in the data warehouse.
•Data marts are designed to be made available to specific sets of users.
For example, data about manufacturing can be put into a data mart and
be made available to the production department.
28. Relational DBMS
where all data are kept in• Relational database management systems, tables or relations
• More flexible & easy to use
• Almost any item of data can be accessed more quickly than the other models
• This is what is referred to as Relational Database Management Systems
(RDBMS)
• Edgar F. Codd at IBM invented the relational database in 1970. Referred to
as RDBMS
Edgar F. Codd
29. Relational Database Terminology
•An attribute draws its values from a domain
•A domain is the set of possible values for this attribute
•The degree of a relation is the number of its attributes or
columns
•The cardinality of a relation is the number of its tuples or
rows
31. SQL Overview
•Structured Query Language (SQL) is a high-level language that
allows users to manipulate relational data.
•One of the strengths of SQL is that users need only specify the
information they need without having to know how to retrieve it.
•In other words, SQL is an English-like language with database
specific constructs and keywords that are simple to use and
understand.
•It supports multiple access mechanisms to address different
usages and requirements.
• The database management system is responsible for
developing the access path needed to retrieve the data.
•SQL works at a set level, and is designed to retrieve rows of one
or more tables.
32. SQL Categories
SQL has three categories based on the functionality
involved:
•DDL – Data definition language used to define, change, or drop
database objects
•DML – Data manipulation language used to read and
modify data
•DCL – Data control language used to grant and revoke
authorizations
33. Data Types: Date and Time
All databases support various date and time specific data
types and functions. DB2 has the following data types for
date and time.
•Date (YYYY-MM-DD)
•Time (HH:MM:SS)
•Timestamp (YYYY-MM-DD-HH:MM:SS:ssssss)
34. Creating a table
• CREATE TABLE myTable (col1 integer)
Default Values
CREATE TABLE USERS (NAME CHAR(20),
AGE INTEGER,
PROFESSION VARCHAR(30) with default 'Student')
35. Constraints
The following example shows a table definition with several
CHECK constraints and a PRIMARY KEY defined:
CREATE TABLE EMPLOYEE
(ID INTEGER NOT NULL PRIMARY KEY, NAME VARCHAR(9),
DEPT SMALLINT CHECK (DEPT BETWEEN 10 AND 100),
JOB CHAR(5) CHECK (JOB IN ('Sales','Mgr','Clerk')), HIREDATE DATE,
SALARY DECIMAL(7,2),
CONSTRAINT YEARSAL CHECK ( YEAR(HIREDATE) > 1986 OR SALARY >
40500 )
)
36. Deletes and Updates
• There are different rules to
handle deletes and updates and
the behavior depends on the
following constructs used when
defining the tables:
•CASCADE
•SET NULL
•NO ACTION
•RESTRICT
The statement below shows where
the delete and update rules are
specified:
ALTER TABLE DEPENDANT_TABLE
ADD CONSTRAINT constraint_name
FOREIGN KEY column_name
ON DELETE <delete_action_type> ON
UPDATE <update_action_type>
37. Creating a schema
•To create a schema, use this statement: create schema
mySchema
•To create a table with the above schema, explicitly include it in
the CREATE TABLE statement as follows:
create table mySchema.myTable (col1 integer)
•When the schema is not specified, DB2 uses an implicit
schema, which is typically the user ID used to connect to the
database. To change the implicit schema for the current
session the SET CURRENT SCHEMA command can be used
as follows:
set current schema mySchema
38. Creating a view
•To create a view based on the EMPLOYEE table we can do as
follows:
CREATE VIEW MYVIEW AS
SELECT LASTNAME, HIREDATE FROM EMPLOYEE
•Once the view is created, you can use it just like any table.
For example, we can issue a simple SELECT statement as
follows:
SELECT * FROM MYVIEW
39. Creating other database objects
•Modifying database objects
alter table myTable alter column col1 set not null
•Renaming database objects
RENAME <object type> <object name> to <new name>
•To rename a column, the ALTER TABLE SQL statement should be used
in conjunction with RENAME. For example:
ALTER TABLE <table name> RENAME COLUMN
<column name> TO <new name>
40. Data manipulation with SQL
Selecting data
•Selecting data is performed using the SELECT statement.
•Assuming a table name of myTable, the simplest statement to
select data from this table is:
select * from myTable
•Typically, not all columns of a table are required; in which case,
a selective list of columns should be specified. For example,
select col1, col2 from myTable
•Retrieves col1 and col2 for all rows of the table myTable where col1 and
col2 are the names of the columns to retrieve data from
41. Ordering the result set
•To guarantee the result set is displayed in the same order all
the time, either in ascending or descending order of a column
or set of columns, use the ORDER BY clause.
•For example this statement returns the result set based on the
order of col1 in ascending order:
SELECT col1 FROM myTable ORDER BY col1 ASC
•ASC stands for ascending, which is the default. Descending
order can be specified using DESC as shown below:
SELECT col1 FROM myTable ORDER BY col1 DESC
42. Inserting Data
•To insert data into a table use the INSERT statement.
•Statements insert one row at a time into the table myTable.
insert into myTable values (1);
insert into myTable values (1, ‘myName’, ‘2010-01-01’);
•Statements insert multiple (three) rows into the table myTable
:
insert into myTable values (1),(2),(3);
insert into myTable values (1, ‘myName1’,’2010-01-01’), (2, ‘myName2’,’2010-02-01’), (3,
‘myName3’,’2010-03-01’);
•Statement inserts all the rows of the sub-query “select * from
myTable2” into the table myTable.
insert into myTable (select * from myTable2)
43. Deleting Data
•The DELETE statement is used to delete rows of a table. One
or multiple rows from a table can be deleted with a single
statement by specifying a delete condition using the WHERE
clause. For example, this statement deletes all the rows where
col1 > 1000 in table myTable.
DELETE FROM myTable WHERE col1 > 1000
•Note that care should be taken when issuing a delete
statement. If the WHERE clause is not used, the DELETE
statement will delete all rows from the table.
44. Updating Data
•Use the UPDATE statement to update data in your table. One
or multiple rows from a table can be updated with a single
statement by specifying the update condition using a WHERE
clause. For each row selected for update, the statement can
update one or more columns.
•For example:
UPDATE myTable SET col1 = -1 WHERE col2 < 0
UPDATE myTable SET col1 = -1, col2 = ‘a’, col3 = ‘2010-01-01’ WHERE col4 = ‘0’
•Note that care should be taken when issuing an update
statement without the WHERE clause. In such cases, all the
rows in the table will be updated.
45. Table Joins
•A simple SQL select statement is one that selects one or more
columns from any single table.
•The next level of complexity is added when a select statement
has two or more tables as source tables. This leads to multiple
possibilities of how the result set will be generated.
•There are two types of table joins in SQL statements:
Inner join
Outer join
•These are explained in more detail in the following sections.
46. Inner join
An inner join is the most common form of joins used in the sql statements.It can be
classified as follows:
1.Equi join
2.Natural join
3.cross join
46
47. Inner Join: Equi-Join
•This type of join happens when two tables are joined based on
the equality of specified columns; for example:
SELECT *
FROM student, enrollment
WHERE student.enrollment_no=enrollment.enrollment_no
OR
SELECT *
FROM student
INNER JOIN enrollment
ON student.enrollment_no=enrollment.enrollment_no
48. Inner Join: Natural Join
•A natural join is an improved version of an equi-join where the
joining column does not require specification.
•The system automatically selects the column with same name
in the tables and applies the equality operation on it.
•A natural join will remove all duplicate attributes. Below is an
example.
SELECT *
FROM STUDENT
NATURAL JOIN ENROLLMENT
49. Inner Join: Cross Join
•A cross join is simply a Cartesian product of the tables to be
joined. For example:
SELECT * FROM STUDENT, ENROLLMENT
50. Outer Joins
• An outer join is a specialized form of join used in SQL statements.
• In an outer joins, the first table specified in an SQL statement in the FROM
clause is referred as the LEFT table and the remaining table is referred as the
RIGHT table.
• An outer join is of the following three types:
Left outer join
Right outer join
Full outer join
51. Left outer join
•In a left outer join, the result set is a union of the results of an equi- join,
including any non- matching rows from the LEFT table. For example,
the following statement would return the rows shown in the below figure.
ELECT *S
FROM STUDENT
LEFT OUTER JOIN ENROLLMENT
ON STUDENT.ENROLLMENT_NO = ENROLLMENT_NO
52. Right outer join
•In a right outer join, the result set is the union of results of an
equi-join, including any non- matching rows from the RIGHT
table. For example, the following statement would return the
rows shown in the below figure.
SELECT *
FROM STUDENT
RIGHT OUTER JOIN ENROLLMENT
ON STUDENT.ENROLLMENT_NO = ENROLLMENT_NO
53. Full outer join
•In a full outer join, the result set is the union of results of an
equi- join, including any non- matching rows of the LEFT and
the RIGHT table. For example, the following statement would
return the rows shown in the below Figure.
SELECT *
FROM STUDENT
FULL OUTER JOIN ENROLLMENT
ON STUDENT.ENROLLMENT_NO = ENROLLMENT_NO
58. Relational Operators
Relational operators are basic tests and operations that can
be performed on data. These operators include:
•Basic mathematical operations like ‘+’, ‘-‘, ‘*’ and ‘/’
•Logical operators like ‘AND’, ‘OR’ and ‘NOT’
•String manipulation operators like ‘CONCATENATE’, ‘LENGTH’,
‘SUBSTRING’
•Comparative operators like ‘=’, ‘<’, ‘>’, ‘>=’, ‘<=’ and ‘!=’
•Grouping and aggregate operators
•Other miscellaneous operations like DISTINCT
59. Sub-queries
•When a query is applied within a query, the outer query is
referred to as the main query or parent query and the internal
query is referred as the sub-query or inner query.
•This sub query may return a scalar value, single or multiple
tuples, or a NULL data set.
•Sub-queries are executed first, and then the parent query is
executed utilizing data returned by the sub- queries.
60. Sub-queries returning a scalar value
•Scalar values represent a single value of any attribute or entity, for
example Name, Age, Course, Year, and so on.
•The following query uses a sub-query that returns a scalar value,
which is then used in the parent query to return the required result:
SELECT name FROM students_enrollment
WHERE age = ( SELECT min(age) FROM students )
•The above query returns a list of students who are the youngest
among all students.
•The sub-query “SELECT min(age) FROM students” returns a scalar
value that indicates the minimum age among all students.
•The parent query returns a list of all students whose age is equal to
the value returned by the sub-query.
61. Sub-queries returning vector values
• When a sub-query returns a data set that represents multiple values for a column
(like a list of names) or array of values for multiple columns (like Name, age and
date of birth for all students), then the sub-query is said to be returning vector
values.
• For example, to get a list of students who are enrolled in courses offered by the
computer science department, we will use the following nested query:
SELECT name FROM students WHERE course_enrolled IN ( SELECT distinct course_name FROM
courses
WHERE department_name = ‘Computer Science’
)
• Here the sub-query returns a list of all courses that are offered in the “Computer
Science” department and the outer query lists all students enrolled in the courses of
the sub-query result set.
62. Co-related sub-query
•When a sub-query is executed for each row of the parent table, instead
of once (as shown in the examples above) then the sub-query is
referred to as a correlated sub-query. For example:
SELECT dept, name, marks
FROM final_result a WHERE marks = (
SELECT max(marks) FROM final_result WHERE dept = a.dept
)
• The above statement searches for a list of students with their
departments, who have been awarded maximum marks in each
department.
•For each row on the LEFT table, the sub- query finds max(marks) for
the department of the current row and if the values of marks in the
current row is equal to the sub-query result set, then it is added to the
outer query result set.
63. Sub-query in FROM Clauses
•A sub-query can be used in a FROM clause as well, as shown in
following example:
SELECT dept, max_marks, min_marks, avg_marks FROM
(
SELECT dept,
max(marks) as max_marks, min(marks) as min_marks, avg(marks) as avg_marks FROM
final_result GROUP BY dept
)
WHERE (max_marks – min_marks) > 50 and avg_marks < 50
•The above query uses a sub-query in the FROM clause.
•The sub-query returns maximum, minimum and average marks for
each department.
•The outer query uses this data and filters the data further by adding
filter conditions in the WHERE clause of the outer query.
65. JDBC
•JDBC stands for Java Database Connectivity, which is a standard Java
API for database-independent connectivity between the Java
programming language and a wide range of databases.
JDBC Architecture
66. Common JDBC components
The JDBC API provides the following interfaces and classes:
• Driver Manager
• Driver
• Connection
• Statement
• Result Set
• SQL Exception
67. Database APIs
•Database APIs are totally dynamic in nature and can be
developed independently of the database software being
used, and without the need for pre-compilation.
•Database APIs, as the name suggests, are a set of APIs
exposed by database vendors pertaining to different
programming languages like C, C++, Java and so on, which
gives application developers a mechanism to interact with the
database from within the application by just calling these SQL
callable interfaces.
68. ODBC and the IBM Data
Server
CLI driver
called Open Database
systems, based on a
• Microsoft developed a callable SQL interface
Connectivity (ODBC) for Microsoft operating
preliminary draft of X/Open CLI.
• However, ODBC is no longer limited to Microsoft operating systems and
currently many implementations are available on other platforms as well.
• The IBM Data Server CLI driver is the DB2 Call level Interface which is based
on the Microsoft® ODBC specifications, and the International Standard for
SQL/CLI.
• These specifications were chosen as the basis for the DB2 Call Level
Interface in an effort to follow industry standards and to provide a shorter
learning curve for those application programmers already familiar with either
of these database interfaces.
71. Indexes concept (1 of 3)
Types of Indexes
•Different types of indexes can be created for different purposes and
performance benefits.
Indexes on partitioned tables
•Partitioned tables can have indexes that are nonpartitioned (existing in
a single table space within a database partition), indexes that are
themselves partitioned across one or more table spaces within a
database partition, or a combination of the two.
Designing indexes
•Indexes are typically used to speed up access to a table. However,
they can also serve a logical data design purpose.
72. Indexes concept (2 of 3)
Creating indexes
•Indexes can be created for many reasons, including: to allow queries to
run more efficiently; to order the rows of a table in ascending or
descending sequence according to the values in a column; to enforce
constraints such as uniqueness on index keys. You can use the CREATE
INDEX statement, the DB2® Design Advisor,or the db2advis Design
Advisor command to create the indexes.
Modifying indexes
•If you want to modify your index, other than using the ALTER INDEX
statement to enable or disable index compression, you must drop the
index first and then create the index again.
Dropping indexes
•To delete an index, use the DROP statement.
73. Clustered and Non-clustered indexes
•Index architectures are classified as clustered or non-
clustered. Clustered indexes are indexes whose order of the
rows in the data pages corresponds to the order of the rows in
the index.
•This order is why only one clustered index can exist in any
table, whereas, many non-clustered indexes can exist in the
table. In some database systems, the leaf node of the
clustered index corresponds to the actual data, not a pointer to
data that is found elsewhere.
•Both clustered and non-clustered indexes contain only keys
and record identifiers in the index structure.
74. IBM ICE (Innovation Centre forEducation)
Failure management with DB2
cluster services
•DB2 cluster services is software that provides automatic heartbeat
failure detection and automatically initiates the necessary recovery
operations after a failure is detected.
•It also provides the cluster file system that gives each host in a DB2
pureScale® instance access to a common file system.
•DB2 cluster services includes technology from IBM® Tivoli® System
Automation for Multiplatforms (Tivoli SA MP) software, IBM Reliable
Scalable Clustering Technology (RSCT) software, and IBM General
Parallel File System (GPFS™) software.
•This technology is packaged as an integral part of the DB2 pureScale
Feature.
End user data – raw facts of interests to the end user
Metadata – description of data characteristics and relationships
Organization of data helps:
improve data accuracy
improve timely access
correlation and comparison
DBMS: a collection of programs that manages the database structure, control access to data
- Makes data management more efficient and effective
- Query language allows quick answers to ad hoc queries
- Provides better access to more and better-managed data
- Promotes integrated view of organization’s operations
- Reduces the probability of inconsistent data