My INSURER PTE LTD - Insurtech Innovation Award 2024
B & c
1. What is RDBMS?
RDBMS stands for Relational Database Management System. RDBMS data is structured in database
tables, fields and records. Each RDBMS table consists of database table rows. Each database table
row consists of one or more database table fields.
RDBMS store the data into collection of tables, which might be related by common fields (database
table columns). RDBMS also provide relational operators to manipulate the data stored into the
database tables. Most RDBMS use SQL as database query language.
A relational database management system (RDBMS) is a program that lets you create, update, and
administer a relational database. Most commercial RDBMS's use the Structured Query Language
(SQL) to access the database, although SQL was invented after the development of the relational
model and is not necessary for its use.
The leading RDBMS products are Oracle, IBM's DB2 and Microsoft's SQL Server. Despite repeated
challenges by competing technologies, as well as the claim by some experts that no current RDBMS
has fully implemented relational principles, the majority of new corporate databases are still being
created and managed with an RDBMS.
2. Keys in DBMSThe key is defined as the column or attribute of the database table. For example if a
table has id,name and address as the column names then each one is known as the key for that table.
We can also say that the table has 3 keys as id, name and address. The keys are also used to identify
each record in the database table.The following are the various types of keys available in the DBMS
system.
A simple key contains a single attribute .
A composite key is a key that contains more than one attribute.
A superkey is any set of attributes that uniquely identifies a row.
A candidate key is a minimal super key , which means its contains the minimum number
of attributes . A super key can contain redundant attributes but candidate key contains only
those attributes which are required to uniquely determine records in the table
3. Example for Keys
For example
if ABC is a super key with three attributes A, B C and if A and B alone are sufficient to determine the
rows in the table, then AB will be candidate key .
A primary key is the key which is selected as the principal unique identifier by the database
schema designer . The primary key is usually the key selected to identify a row when the database is
physically implemented. Serial no. roll no. , invoice id are the examples of primary key
A foreign key is an attribute (or set of attributes) that appears (usually) as a non key attribute in one
relation and as a primary key attribute in another relation. I say usually because it is possible for a
foreign key to also be the whole or part of a primary key:
A many-to-many relationship can only be implemented by introducing an intersection or link table
which then becomes the child in two one-to-many relationships. The intersection table therefore has
a foreign key for each of its parents, and its primary key is a composite of both foreign keys.
A one-to-one relationship requires that the child table has no more than one occurrence for each
parent, which can only be enforced by letting the foreign key also serve as the primary key.
4. Super Key
A Super key is any combination of fields within a table that uniquely identifies each record within
that table.
Candidate Key
A candidate is a subset of a super key. A candidate key is a single field or the least combination of
fields that uniquely identifies each record in the table. The least combination of fields distinguishes
a candidate key from a super key. Every table must have at least one candidate key but at the same
time can have several.
5. As an example we might have a student_id that uniquely identifies the students in a student table.
This would be a candidate key. But in the same table we might have the student’s first name and last
name that also, when combined, uniquely identify the student in a student table. These would both
be candidate keys.
In order to be eligible for a candidate key it must pass certain criteria.
It must contain unique values
It must not contain null values
It contains the minimum number of fields to ensure uniqueness
It must uniquely identify each record in the table
Once your candidate keys have been identified you can now select one to be your primary key
6. Primary Key
A primary key is a candidate key that is most appropriate to be the main reference key for the table.
As its name suggests, it is the primary key of reference for the table and is used throughout the
database to help establish relationships with other tables. As with any candidate key the primary key
must contain unique values, must never be null and uniquely identify each record in the table.
As an example, a student id might be a primary key in a student table, a department code in a table
of all departments in an organization. This module has the code DH3D 35 that is no doubt used in a
database somewhere to identify RDBMS as a unit in a table of modules. In the table below we have
selected the candidate key student_id to be our most appropriate primary key
Primary keys are mandatory for every table each record must have a value for its primary
key. When choosing a primary key from the pool of candidate keys always choose a
single simple key over a composite key.
7. Foreign Key
A foreign key is generally a primary key from one table that appears as a field in another where the
first table has a relationship to the second. In other words, if we had a table A with a primary key X
that linked to a table B where X was a field in B, then X would be a foreign key in B.
An example might be a student table that contains the course_id the student is attending. Another
table lists the courses on offer with course_id being the primary key. The 2 tables are linked through
course_id and as such course_id would be a foreign key in the student table.
8. Secondary Key or Alternative Key
A table may have one or more choices for the primary key. Collectively these are known as candidate
keys as discuss earlier. One is selected as the primary key. Those not selected are known as secondary
keys or alternative keys.
For example in the table showing candidate keys above we identified two candidate keys, studentId
and firstName + lastName. The studentId would be the most appropriate for a primary key leaving
the other candidate key as secondary or alternative key. It should be noted for the other key to be
candidate keys, we are assuming you will never have a person with the same first and last name
combination. As this is unlikely we might consider fistName+lastName to be a suspect candidate key
as it would be restrictive of the data you might enter. It would seem a shame to not allow John Smith
onto a course just because there was already another John Smith.
Simple Key
Any of the keys described before (ie primary, secondary or foreign) may comprise one or more fields,
for example if firstName and lastName was our key this would be a key of two fields where as
studentId is only one. A simple key consists of a single field to uniquely identify a record. In addition
the field in itself cannot be broken down into other fields, for example, studentId, which uniquely
identifies a particular student, is a single field and therefore is a simple key. No two students would
have the same student number.
9. Compound Key
A compound key consists of more than one field to uniquely identify a record. A compound key
is distinguished from a composite key because each field, which makes up the primary key, is
also a simple key in its own right. An example might be a table that represents the modules a
student is attending. This table has a studentId and a moduleCode as its primary key. Each of
the fields that make up the primary key are simple keys because each represents a unique
reference when identifying a student in one instance and a module in the other.
Composite
A composite key consists of more than one field to uniquely identify a record. This differs from a
compound key in that one or more of the attributes, which make up the key, are not simple keys
in their own right. Taking the example from compound key, imagine we identified a student by
their firstName + lastName. In our table representing students on modules our primary key
would now be firstName + lastName + moduleCode. Because firstName + lastName represent a
unique reference to a student, they are not each simple keys, they have to be combined in order
to uniquely identify the student. Therefore the key for this table is a composite key.
10. Introduction to Data Integrity
It is important that column data adhere to a predefined set of rules, as determined by
the database administrator or application developer.
For example, some columns in a database table can have specific rules that constrain the
data contained within them. These constraints can affect how data columns in one table
relate to those in another table.
Data Integrity Rules
This section describes the rules that can be applied to table columns to enforce different
types of data integrity.
• Null rule: A null rule is a rule defined on a single column that allows or disallows
inserts or updates of rows containing a null (the absence of a value) in that column.
• Unique column values: A unique value rule defined on a column (or set of
columns) allows the insert or update of a row only if it contains a unique value in that
column (or set of columns).
• Primary key values: A primary key value rule defined on a key (a column or set of
columns) specifies that each row in the table can be uniquely identified by the values
in the key.
11. Referential integrity also includes the rules that dictate what types of data manipulation are
allowed on referenced values and how these actions affect dependent values. The rules associated
with referential integrity are:
• Restrict: Disallows the update or deletion of referenced data.
• Set to null: When referenced data is updated or deleted, all associated dependent data is set
to NULL.
• Set to default: When referenced data is updated or deleted, all associated dependent data is set
to a default value.
• Cascade: When referenced data is updated, all associated dependent data is correspondingly
updated. When a referenced row is deleted, all associated dependent rows are deleted.
• No action: Disallows the update or deletion of referenced data. This differs from RESTRICT in
that it is checked at the end of the statement, or at the end of the transaction if the constraint
is deferred. (Oracle Database uses No Action as its default action.)
• Complex integrity checking: A user-defined rule for a column (or set of columns) that allows
or disallows inserts, updates, or deletes of a row based on the value it contains for the column
(or set of columns).
12. Advantages of Integrity Constraints
This section describes some of the advantages that integrity constraints associated
with database tables have over other alternatives. These advantages are:
• Enforcing business rules in the code of a database application
• Using stored procedures to completely control access to data
• Enforcing business rules with triggered stored database procedures
13. Dr. E. F. Codd's 12 rules for relational database:
0.Foundation RuleA relational database management system must manage its stored
data using only its relational capabilities.
1. Information Rule
All information in the database should be represented in one and only one way - as
values in a table.
2. Guaranteed Access Rule
Each and every datum (atomic value) is guaranteed to be logically accessible by
resorting to a combination of table name, primary key value and column name.
3. Systematic Treatment of Null Values
Null values (distinct from empty character string or a string of blank characters and
distinct from zero or any other number) are supported in the fully relational DBMS
for representing missing information in a systematic way, independent of data type.
14. 4.Dynamic On-line Catalog Based on the Relational Model
The database description is represented at the logical level in the same way as ordinary
data, so authorized users can apply the same relational language to its interrogation as
they apply to regular data.
5.Comprehensive Data Sublanguage Rule
A relational system may support several languages and various modes of terminal use.
However, there must be at least one language whose statements are expressible, per
some well-defined syntax, as character strings and whose ability to support all of the
following is comprehensible:
data definition
view definition
data manipulation (interactive and by program)
integrity constraints
authorization
transaction boundaries (begin, commit, and rollback).
6.View Updating Rule
All views that are theoretically updateable are also updateable by the system.
15. 7. High-level Insert, Update, and Delete
The capability of handling a base relation or a derived relation as a single operand applies
nor only to the retrieval of data but also to the insertion, update, and deletion of data.
8.Physical Data Independence
Application programs and terminal activities remain logically unimpaired whenever any
changes are made in either storage representation or access methods.
9.Logical Data Independence
Application programs and terminal activities remain logically unimpaired when
information preserving changes of any kind that theoretically permit unimpairment are
made to the base tables.
10.Integrity Independence
Integrity constraints specific to a particular relational database must be definable in the
relational data sublanguage and storable in the catalog, not in the application programs.
16. 11. Distribution Independence
The data manipulation sublanguage of a relational DBMS must enable application
programs and terminal activities to remain logically unimpaired whether and whenever
data are physically centralized or distributed.
12.Nonsubversion Rule
If a relational system has or supports a low-level (single-record-at-a-time) language,
that low-level language cannot be used to subvert or bypass the integrity rules or
constraints expressed in the higher-level (multiple-records-at-a-time) relational
language.
17. Relational algebra
In computer science, relational algebra is an offshoot of first-order logic and of algebra
of sets concerned with operations over finitary relations, usually made more convenient
to work with by identifying the components of a tuple by a name (called attribute) rather
than by a numeric column index, which is called a relation in database terminology.
The main application of relational algebra is providing a theoretical foundation
for relational databases, particularly query languages for such databases, chief among
which is SQL.
An algebra is a formal structure consisting of sets and operations on those sets.
Relational algebra is a formal system for manipulating relations.
Operands of this algebra are relations.
Operations of this algebra include the usual set operations (since relations are sets of
tuples), and special operations defined for relations
selection
projection
join
18. Relational Set Operators uses relational algebra to manipulate contents in a
database. All together there are eight different types of operators. These operators
are SQL commands.
The first operator is the UNION. It combines all of the rows in one table with all of
the rows in another table except for the duplicate tuples. The tables are required to
have the same attribute characteristics for the Union command to work. The tables
must be union-compatible which means that two tables being used have the same
amount of columns and the columns have the same names, and also need to share
the same domain.
INTERSECT is the second SQL command that takes two tables and combines only
the rows that appear in both tables. The tables must be union-compatible to be
able to use the Intersect command or else it won't work.
DIFFERENCE in another SQL command that gets all rows in one table that are not
found in the other table. Basically it subracts one table from the other table to leave
only the attributes that are not the same in both tables. For this command to work
both tables must be union-compatible.
19. PRODUCT command would show all possible pairs of rows from both tables being
used. This command can also be referred to as the Cartesian Product.
SELECT is the command to show all rows in a table. It can be used to select only
specific data from the table that meets certain criteria. This command is also
referred to as the Restrict command.
PROJECT is the command that gives all values for certian attributes specified after
the command. It shows a vertical view of the given table.
JOIN takes two or more tables and combines them into one table. This can be used
in combination with other commands to get specific information. There are several
types of the Join command. The Natural Join, Equijion, Theta Join, Left Outer Join
and Right Outer Join.
DIVIDE has specific requirements of the table. One of the tables can only have one
column and the other table must have two columns only.
20. Normalization
In creating a database, normalization is the process of organizing it into tables in such a
way that the results of using the database are always unambiguous and as intended.
Normalization may have the effect of duplicating data within the database and often
results in the creation of additional tables. (While normalization tends to increase the
duplication of data, it does not introduce redundancy, which is unnecessary duplication.)
Normalization is typically a refinement process after the initial exercise of identifying the
data objects that should be in the database, identifying their relationships, and defining
the tables required and the columns within each table.
A simple example of normalizing data might consist of a table showing:
Customer Item purchased Purchase price
Thomas Shirt $40
Maria Tennis shoes $35
Evelyn Shirt $40
Pajaro Trousers $25
21. Normalization degrees of relational database tables have been defined and include:
First normal form (1NF). This is the "basic" level of normalization and generally
corresponds to the definition of any database, namely:
It contains two-dimensional tables with rows and columns.
Each column corresponds to a sub-object or an attribute of the object represented by
the entire table.
Each row represents a unique instance of that sub-object or attribute and must be
different in some way from any other row (that is, no duplicate rows are possible).
All entries in any column must be of the same kind. For example, in the column
labeled "Customer," only customer names or numbers are permitted.
22. Second normal form (2NF).
At this level of normalization, each column in a table that is not a determiner of the
contents of another column must itself be a function of the other columns in the table.
For example, in a table with three columns containing customer ID, product sold, and
price of the product when sold, the price would be a function of the customer ID (entitled
to a discount) and the specific product.
Third normal form (3NF).
At the second normal form, modifications are still possible because a change to one row
in a table may affect data that refers to this information from another table. For example,
using the customer table just cited, removing a row describing a customer purchase
(because of a return perhaps) will also remove the fact that the product has a certain
price. In the third normal form, these tables would be divided into two tables so that
product pricing would be tracked separately.
23. What is Normalization?
Normalization is the process of efficiently organizing data in a database. There are
two goals of the normalization process: eliminating redundant data (for example,
storing the same data in more than one table) and ensuring data dependencies make
sense (only storing related data in a table). Both of these are worthy goals as they
reduce the amount of space a database consumes and ensure that data is logically
stored.
The Normal Forms
The database community has developed a series of guidelines for ensuring that
databases are normalized. These are referred to as normal forms and are numbered
from one (the lowest form of normalization, referred to as first normal form or 1NF)
through five (fifth normal form or 5NF). In practical applications, you'll often
see 1NF, 2NF, and3NF along with the occasional 4NF. Fifth normal form is very rarely
seen and won't be discussed in this article.
24. First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database:Eliminate
duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a
unique column or set of columns (the primary key).
Second Normal Form (2NF)
Second normal form (2NF) further addresses the concept of removing duplicative
data:
Meet all the requirements of the first normal form.
Remove subsets of data that apply to multiple rows of a table and place them in
separate tables.
Create relationships between these new tables and their predecessors through the
use of foreign keys.
25. Third Normal Form (3NF)
Third normal form (3NF) goes one large step further:
Meet all the requirements of the second normal form.
Remove columns that are not dependent upon the primary key.
For more details, read Putting your Database in Third Normal Form
Boyce-Codd Normal Form (BCNF or 3.5NF)
The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal
form", adds one more requirement:
Meet all the requirements of the third normal form.
Every determinant must be a candidate key.
Fourth Normal Form (4NF)
Finally, fourth normal form (4NF) has one additional requirement:
Meet all the requirements of the third normal form.
A relation is in 4NF if it has no multi-valued dependencies.
Remember, these normalization guidelines are cumulative. For a database to be in 2NF,
it must first fulfill all the criteria of a 1NF database.
26. Suppose we are to manage all the databases of a company (say, My Company). The company
must keep track of all the employees, customers, product details and the salary details of all
the employees. A simple and straight forward way to do this is to put all this information into a
single table and manage all those simultaneously.
See below.
Looking at the above table, you may feel that it is
perfectly fine. After all, what is the problem with it?
We have a big table; we have all the information
required by the company together in a single space,
thus saving a lot of memory. Well and good!
But, now think! If suppose, we need to frequently
retrieve/update data about just the employees.
Here, does the customer’s information or the
product details really matter. Definitely no. So, why
use the entire table for using just a part of it? We
need a solution to this. And the solution is
normalization. What we create using normalization
is often called as normal forms. Let study about the
popular and most widely used normal forms.
Example To Show Normalization
27. The First Normal Form
To solve the above problem, the first and foremost thing to be done is to divide the entire raw
database into smaller tables based on the actual groupings. When each table has been designed,
a primary key is assigned to most or all tables. Note that the primary key must be a unique value,
so try to select a data element for the primary key that naturally uniquely identifies a specific
piece of data.
So, let us take up the same previous example and prepare our First normal form. See the figure
below:
As we can see, the big raw database is divided into three smaller tables- one for employee, customers
and products details, each.Thus, to access any one of these tables, we need not handle the other two
tables.
28. The Second Normal Form
The objectives of the second normal form is to take data that is only partly dependent on the
primary key and enter that data into another table. Let us take up the same example of Fig 1-
2 Consider the table-Employee
Here, the entire table has information about the personal details as well as the salary
information. But, it is well understood that, to pay salary to an employee, the company does
not actually need the employee’s personal details. Just his emp_id is sufficient. So, why not
use just that? This is the second normal form. Same goes with Customers table. We can
separate customer’s information from the order details.
See the figure below:
29. The Third Normal Form
The third normal form’s objective is to remove data in a table that is not dependent on the
primary key.
See the same example of Fig 1-3. For the table named Emp_Pay,
the position and position_desc fields are not dependent on primary key (emp_id). So, the better
option is to move both these fields to another table.
30. Need of Normalization
When normalizing a database you should achieve four goals:
Arranging data into logical groups such that each group describes a small part of the whole
Minimizing the amount of duplicated data stored in a database
Building a database in which you can access and manipulate the data quickly and efficiently
without compromising the integrity of the data storage
Organizing the data such that, when you modify it, you make the changes in only one place.
When you normalize a database, you start from the general and work towards the specific,
applying certain tests (checks) along the way. Some users call this process decomposition. It
means decomposing (dividing/breaking down) a ‘big' un-normalized table (file) into several
smaller tables by:
Eliminating insertion, update and delete anomalies
Establishing functional dependencies
Removing transitive dependencies
Reducing non-key data redundancy
31. ADVANTAGES OF NORMALIZATION
The following are the advantages of the normalization.
More efficient data structure.
Avoid redundant fields or columns.
More flexible data structure i.e. we should be able to add new rows and data values easily
Better understanding of data.
Ensures that distinct tables exist when necessary.
Easier to maintain data structure i.e. it is easy to perform operations and complex
queries can be easily handled.
Minimizes data duplication.
Close modeling of real world entities, processes and their relationships.
32. DISADVANTAGES OF NORMALIZATION
The following are disadvantages of normalization.
o You cannot start building the database before you know what the user needs.
o On Normalizing the relations to higher normal forms i.e. 4NF, 5NF the performance
degrades.
o It is very time consuming and difficult process in normalizing relations of higher
degree.
o Careless decomposition may leads to bad design of database which may leads to serious
problems.
33. Functional Dependency
Functional dependency is a relationship that exists when one attribute uniquely
determines another attribute. If R is a relation with attributes X and Y, a functional
dependency between the attributes is represented as X->Y, which specifies Y is
functionally dependent on X. Here X is termed as a determinant set and Y as a
dependant attribute. Each value of X is associated precisely with one Y value.
Functional dependency in a database serves as a constraint between two sets of
attributes. Defining functional dependency is an important part of relational database
design and contributes to aspect normalization.
A dependency occurs in a database when information stored in the same database table
uniquely determines other information stored in the same table. You can also describe
this as a relationship where knowing the value of one attribute (or a set of attributes) is
enough to tell you the value of another attribute (or set of attributes) in the same table.
Saying that there is a dependency between attributes in a table is the same as saying
that there is a functional dependency between those attributes. If there is a dependency
in a database such that attribute B is dependent upon attribute A, you would write this
as
“A -> B”.
34. For example,
In a table listing employee characteristics including Social Security Number (SSN) and
name, it can be said that name is dependent upon SSN (or SSN -> name) because an
employee's name can be uniquely determined from their SSN. However, the reverse
statement (name -> SSN) is not true because more than one employee can have the same
name but different SSNs.
Types of Dependencies
Trivial Functional Dependencies
A trivial functional dependency occurs when you describe a functional dependency of an
attribute on a collection of attributes that includes the original attribute. For example, “{A, B} -> B”
is a trivial functional dependency, as is “{name, SSN} -> SSN”. This type of functional dependency is
called trivial because it can be derived from common sense. It is obvious that if you already know
the value of B, then the value of B can be uniquely determined by that knowledge.
Full Functional Dependencies
A full functional dependency occurs when you already meet the requirements for a functional
dependency and the set of attributes on the left side of the functional dependency statement cannot be
reduced any farther. For example, “{SSN, age} -> name” is a functional dependency, but it is not a full
functional dependency because you can remove age from the left side of the statement without
impacting the dependency relationship.
35. Transitive dependencies
occur when there is an indirect relationship that causes a functional dependency. For
example, ”A -> C” is a transitive dependency when it is true only because both “A ->
B” and “B -> C” are true.
Multivalued dependencies
occur when the presence of one or more rows in a table implies the presence of one or
more other rows in that same table. For example, imagine a car company that
manufactures many models of car, but always makes both red and blue colors of each
model. If you have a table that contains the model name, color and year of each car
the company manufactures, there is a multivalued dependency in that table. If there
is a row for a certain model name and year in blue, there must also be a similar row
corresponding to the red version of that same car.
36. Join dependency (JD)
A join dependency (JD) can be said to exist if the join of R1 and R2 over C is equal to
relation R. Where, R1 and R2 are the decompositions R1(A, B, C), and R2 (C,D) of a
given relations R(A, B, C, D). Alternatively, R1 and R2 is a lossless decomposition of R.
In other words, *(A, B, C, D), (C, D) will be a join dependency of R if the join of the
join's attributes is equal to relation R. Here, *(R1, R2, R3, ….) indicates that
relations R1, R2, R3 and so on are a join dependency (JD) of R. Therefore, a necessary
condition for a relation R to satisfy a JD *(R1, R2,…., Rn) is that
R= R1 U R2 U…..URn
Thus, whenever we decompose a relation R into R1 = XUY = and R2 = (R − Y) based
on an MVD X →→ that holds in relation R, the decomposition has lossless join
property. Therefore, lossless-join dependency can be defined as a property of
decomposition, which ensures that no spurious tuples are generated when relations
are returned through a natural join operation.
37. What is SQL?
SQL (Standard Query Language) is a language for manipulating databases
developed in the 70s by IBM. All data management systems use SQL to access data or
to communicate with a data server. RDBMS is the core platform for SQL, and for all
other modern database languages such as Oracle, MS SQL Server, IBM DB2, MySQL,
and Microsoft Access, PostgreSQL, SQLite, Firebird, and many more. SQL (Standard
Query Language) is born as a result of the mathematical work of Codd, who founded
the work of relational databases, three types of manipulations on the database:
1 The maintenance of tables: create, delete, and modify the table structure.
2 The manipulation of databases: Selecting, modifying, deleting records.
3 The management of access rights to tables: Data control: access rights, commit the
changes.
The advantage of SQL is that it is a manipulation language standard databases, you
can use on any database, even if, at first, you do not know its use. Thus, with SQL you
can manage an Access database, but Paradox, dBase, SQL Server, Oracle or Informix
example (the database most used). SQL base is RDBMS. Example of (RDBMS)
Relational database management system (i.e. MySQL, MS Access, SQL Server ).
MySQL, one of the most famous SQL distributions used by the majority of the scripts
on the Internet.
38. Techopedia explains Structured Query Language (SQL)
One of the most fundamental DBA rites of passage is learning SQL, which begins with writing
the first SELECT statement or SQL script without a graphical user interfaces (GUI). Increasingly,
relational databases use GUIs for easier database management, and queries can now be
simplified with graphical tools, e.g., drag-and-drop wizards. However, learning SQL is imperative
because such tools are never as powerful as SQL.
SQL code is divided into four main categories:
Queries are performed using the ubiquitous yet familiar SELECT statement, which is further
divided into clauses, including SELECT, FROM, WHERE and ORDER BY.
Data Manipulation Language (DML) is used to add, update or delete data and is actually a SELECT
statement subset and is comprised of the INSERT, DELETE and UPDATE statements, as well as
control statements, e.g., BEGIN TRANSACTION, SAVEPOINT, COMMIT and ROLLBACK.
Data Definition Language (DDL) is used for managing tables and index structures. Examples of
DDL statements include CREATE, ALTER, TRUNCATE and DROP.
Data Control Language (DCL) is used to assign and revoke database rights and permissions. Its
main statements are GRANT and REVOKE.
39. SQL Basics
Basic SQL Statements include:
CREATE - a data structure
SELECT - read one ormore rows from a table
INSERT - one or more rows into a table
DELETE - one ormore rows from a table
UPDATE - change the column values in a row
DROP - a data structure
In the remainder of this section only simple SELECT statements are considered.
40. Simple SELECT
The syntax of a SELECT statement is :
SELECT column FROM tablename
This would produce all the rows from the specified table, but only for the particular column mentioned. If
you want more than one column shown, you can put in multiple columns separating them with commas,
like:
SELECT column1,column2,column3 FROM tablename If you want to see all the columns of a particular
table, you can type:
SELECT * FROM tablename Lets see it in action on CAR...
SELECT * FROM car;
REGNO MAKE COLOUR PRICE OWNER
F611 AAA FORD RED 12000 Jim Smith
J111 BBB SKODA BLUE 11000 Jim Smith
A155 BDE MERCEDES BLUE 22000 Bob Smith
K555 GHT FIAT GREEN 6000 Bob Jones
SC04 BFE SMART BLUE 13000
41. SELECT regno FROM car;
REGNO
F611 AAA
J111 BBB
A155 BDE
K555 GHT
SC04 BFE
SELECT color, owner FROM car;
COLOUR OWNER
RED Jim Smith
BLUE Jim Smith
BLUE Bob Smith
GREEN Bob Jones
BLUE
In SQL, you can put extra space
characters and return characters just
about anywhere without changing the
meaning of the SQL. SQL is also case-insensitive
(except for things in
quotes). In addition, SQL in theory
should always end with a ';' character.
You need to include the ';' if you have
two different SQL queries so that the
system can tell when one SQL
statement stops and another one
starts. If you forget the ';' the online
interface will put one in for you. For
these reasons all of the following
statements are identical and valid.
SELECT REGNO FROM CAR;
SELECT REGNO FROM CAR
Select REGNO from CAR
select regno FROM car
SELECT regno FROM car;
42. SELECT filters
Displaying all the rows of a table can be handy, but if we have tables with millions of rows then this
type of query could take hours. Instead, we can add "filters" onto a SELECT statement to only show
specific rows of a table. These filters are written into an optional part of the SELECT statement, known
as a WHERE clause.
SELECT columns FROM table WHERE rule The "rule" section of the WHERE clause is checked for
every row that a select statement would normally show. If the whole rule is TRUE, then that row is
shown, whereas if the rule is FALSE, then that row is not shown.
The rule itself can be quite complex. The simplest rule is a single equality test, such as "COLOUR =
'RED'".
Without the WHERE rule would show:
SELECT regno from CAR;
REGNO
F611 AAA
J111 BBB
A155 BDE
K555 GHT
SC04 BFE
43. Comparisons
SQL supports a variety of comparison rules for use in a WHERE clause. These include =,!=,<>, <, <=,
>, and >=.
Examples of a single rule using these comparisons are:
WHERE colour = 'RED'
The colour attribute must be
RED
WHERE colour = 'RED'
The colour attribute must be
RED
WHERE colour != 'RED'
The colour must be a colour
OTHER THAN RED
WHERE colour <> 'RED' The same as !=
WHERE PRICE > 10000
The price of the car is MORE
THAN 10000
WHERE PRICE >= 10000
The price of the car is EQUAL
TO OR MORE THAN 10000
WHERE PRICE < 10000
The price of the car is LESS
THAN 10000
WHERE PRICE <= 10000
The price of the car is EQUAL
TO OR LESS THAN 10000
Note that when dealing with strings, like RED, you must say 'RED'. When dealing with numbers, like
10000, you can say '10000' or 10000. The choice is yours.
44. Database Languages
A DBMS must provide appropriate languages and interfaces for each category of users to express
database queries and updates. Database Languages are used to create and maintain database on
computer. There are large numbers of database languages like Oracle, MySQL, MS Access, dBase,
FoxPro etc. SQL statements commonly used in Oracle and MS Access can be categorized as data
definition language (DDL), data control language (DCL) and data manipulation language (DML).
Data Definition Language (DDL)
It is a language that allows the users to define data and their relationship to other types of data. It is
mainly used to create files, databases, data dictionary and tables within databases.
It is also used to specify the structure of each table, set of associated values with each attribute,
integrity constraints, security and authorization information for each table and physical storage
structure of each table on disk.
The following table gives an overview about usage of DDL statements in SQL
45. Data Manipulation Language (DML)
It is a language that provides a set of operations to support the basic data manipulation
operations on the data held in the databases. It allows users to insert, update, delete and
retrieve data from the database. The part of DML that involves data retrieval is called a query
language.
The following table gives an overview about the usage of DML statements in SQL:
46. Data Control Language (DCL)
DCL statements control access to data and the database using statements such as GRANT
and REVOKE. A privilege can either be granted to a User with the help of GRANT statement.
The privileges assigned can be SELECT, ALTER, DELETE, EXECUTE, INSERT, INDEX etc. In
addition to granting of privileges, you can also revoke (taken back) it by using REVOKE
command.
The following table gives an overview about the usage of DCL statements in SQL:
In practice, the data definition and data manipulation languages are not two separate
languages. Instead they simply form parts of a single database language such as Structured
Query Language (SQL). SQL represents combination of DDL and DML, as well as statements
for constraints specification and schema evaluation
47. Create Table :-
Used to create the tables where data will be stored
Create a table to store personnel data, with a Staff ID column as primary key
. 1. Type this SQL statement in the SQL query design window:
CREATE TABLE Personnel (
StaffID text(9) CONSTRAINT StaffPK PRIMARY KEY,
LastName text(15) not null,
FirstName text(15) not null,
Birthday date,
Department text(12) null);
2. Execute the statement. If Access reports syntax errors, find and correct them.
3. Save the query as DefinePersonnel and close it. In the database window, check the Query list
for this DDL query (notice that the icon for the query is different from the icon for SELECT
queries) and check the Table list for the new Personnel table.
4. Run a query to select all records from the new table:
SELECT * FROM Personnel;
The query returns one blank record (in other databases: 0 rows):
Close the query.
48. 5. Open the new table in datasheet view – it is empty and ready for data entry.
6. Change to design view and compare with the SQL statement. Also, choose View /
Indexes and compare with the constraint created on StaffID:
Close The Personal Table
49. DDL
Data Definition Language (DDL) statements are used to define the database
structure or schema. Some examples:
CREATE - to create objects in the database
ALTER - alters the structure of the database
DROP - delete objects from the database
TRUNCATE - remove all records from a table, including all spaces allocated for
the records are removed
COMMENT - add comments to the data dictionary
RENAME - rename an object
50. DML
Data Manipulation Language (DML) statements are used for managing data
within schema objects. Some examples:
SELECT - retrieve data from the a database
INSERT - insert data into a table
UPDATE - updates existing data within a table
DELETE - deletes all records from a table, the space for the records remain
MERGE - UPSERT operation (insert or update)
CALL - call a PL/SQL or Java subprogram
EXPLAIN PLAN - explain access path to data
LOCK TABLE - control concurrency
DCL
Data Control Language (DCL) statements. Some examples:
GRANT - gives user's access privileges to database
REVOKE - withdraw access privileges given with the GRANT command
51. TCL
Transaction Control (TCL) statements are used to manage the changes made by
DML statements. It allows statements to be grouped together into logical
transactions.
COMMIT - save work done
SAVEPOINT - identify a point in a transaction to which you can later roll back
ROLLBACK - restore database to original since the last COMMIT
SET TRANSACTION - Change transaction options like isolation level and what
rollback segment to use
52. Data manipulation language
DML statements are used to work with the data in tables. When you are connected to most multi-user
databases (whether in a client program or by a connection from a Web page script), you are in effect
working with a private copy of your tables that can’t be seen by anyone else until you are finished (or tell the
system that you are finished). You have already seen the SELECT statement; it is considered to be part of
DML even though it just retreives data rather than modifying it.
The insert statement is used, obviously, to add new rows to a table.
INSERT INTO <table name>
VALUES (<value 1>, ... <value n>);
The comma-delimited list of values must match the table structure exactly in the number of attributes and
the data type of each attribute. Character type values are always enclosed in single quotes; number values
are never in quotes; date values are often (but not always) in the format 'yyyy-mm-dd' (for example, '2006-
11-30').
Yes, you will need a separate INSERT statement for every row.
53. Data manipulation language
The update statement is used to change values that are already in a table.
UPDATE <table name>
SET <attribute> = <expression>
WHERE <condition>;
The update expression can be a constant, any computed value, or even the result of a
SELECT statement that returns a single row and a single column. If the WHERE
clause is omitted, then the specified attribute is set to the same value in every row of
the table (which is usually not what you want to do). You can also set multiple
attribute values at the same time with a comma-delimited list of
attribute=expression pairs.
The delete statement does just that, for rows in a table.
DELETE FROM <table name>
WHERE <condition>;
If the WHERE clause is omitted, then every row of the table is deleted (which again
is usually not what you want to do)—and again, you will not get a “do you really want
to do this?” message.
54. If you are using a large multi-user system, you may need to make your DML changes
visible to the rest of the users of the database. Although this might be done
automatically when you log out, you could also just type:
COMMIT;
If you’ve messed up your changes in this type of system, and want to restore your private
copy of the database to the way it was before you started (this only works if you haven’t
already typed COMMIT), just type:
ROLLBACK;
Although single-user systems don’t support commit and rollback statements, they are
used in large systems to control transactions, which are sequences of changes to the
database. Transactions are frequently covered in more advanced courses.
Privileges
If you want anyone else to be able to view or manipulate the data in your tables, and if your system permits this,
you will have to explicitly grant the appropriate privilege or privileges (select, insert, update, or delete) to them.
This has to be done for each table. The most common case where you would use grants is for tables that you want
to make available to scripts running on a Web server, for example:
GRANT select, insert ON customers TO webuser;
55. SQL: EXISTS CONDITION
The SQL EXISTS condition is used in a SQL query and is considered "to bemet" if the
subquery returns at least one row. It can be used in a SELECT, INSERT, UPDATE,
or DELETE statement.
SQL EXISTS SYNTAX
The syntax for the SQL EXISTS condition is:
WHERE EXISTS ( subquery );
NOTE
SQL Statements that use the SQL EXIST Condition are very inefficient since the sub-query
is RE-RUN for EVERY row in the outer query's table. There are more efficient
ways to write most queries, that do not use the SQL EXISTS Condition.
56. SQL EXISTS EXAMPLE - SELECT STATEMENT
Let's look at a simple example.
The following is a SQL SELECT statement that uses the SQL EXISTS condition:
SELECT *
FROM suppliers
WHERE EXISTS (SELECT *
FROM orders
WHERE suppliers.supplier_id = orders.supplier_id);.
This SQL EXISTS condition example will return all records from the suppliers table where there is at least
one record in the orders table with the same supplier_id.
SQL EXISTS EXAMPLE - SELECT STATEMENT USING NOT EXISTS
The SQL EXISTS condition can also be combined with the SQL NOT operator.
For example,
SELECT *
FROM suppliers
WHERE NOT EXISTS (SELECT *
FROM orders
WHERE suppliers.supplier_id = orders.supplier_id);
This SQL EXISTS example will return all records from the suppliers table where there are no records in
the orders table for the given supplier_id.
57. SQL EXISTS EXAMPLE - INSERT STATEMENT
The following is an example of a SQL INSERT statement that uses the SQL EXISTS
condition:
INSERT INTO suppliers
(supplier_id, supplier_name)
SELECT account_no, name
FROM suppliers
WHERE EXISTS (SELECT *
FROM orders
WHERE suppliers.supplier_id = orders.supplier_id);
SQL EXISTS EXAMPLE - UPDATE STATEMENT
The following is an example of a SQL UPDATE statement that uses the SQL EXISTS
condition:
UPDATE suppliers
SET supplier_name = (SELECT customers.name
FROM customers
WHERE customers.customer_id = suppliers.supplier_id)
WHERE EXISTS (SELECT customers.name
FROM customers
WHERE customers.customer_id = suppliers.supplier_id);
58. Order of Execution of SQL Queries
What actually sets SQL Server apart from other programming languages is the way SQL Server processes its code.
Generally, most programming languages process statement from top to bottom. By contrast, SQL Server processes
them in a unique order which is known as Logical Query Processing Phase. These phases generate a series of virtual
tables with each virtual table feeding into the next phase (virtual tables not viewable). These phases and their orders
are given as follows:
1. FROM
2. ON
3. OUTER
4. WHERE
5. GROUP BY
6. CUBE | ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10. ORDER BY
11. TOP
59. Order of Execution of SQL Queries
Following is the sequence of execution of a SQL query
Step 1: FROM clause - Identify the objects
Step 2: FROM clause Joins - Identified objects are joined based on the conditions
(filtering data)
Step 3: WHERE clause - Where condition is applied which again filters data based
on the conditions applied.
Step 4: GROUP BY clause - Records are grouped based on the condition.
Step 5: HAVING clause - Having conditions are applied to filter the grouped data.
Step 6: SELECT clause - Mentions coloumns are selected
Step 7: ORDER BY clause - Selected columns are finally sorted based on the order
by condition and displayed to user.
60. What is a view ? What are its advantages and disadvantages
A view is virtual table in the database defined by a query. A view does not exist in
the database as a stored set of data values. To reduces redundant data to the
minimum possible, oracle allows the create of an object called a view.
The reasons for creating view sale:
1) When data security is required.
2) When data redundancy is to be kept to the minimum while maintaining data
security
There are 3 types of views
Horizontal view
Vertical view
Joined view
61. Horizontal view restricts a user’s access to selected rows of a table.
Vertical view restricts a user’s access to select columns of a table.
A joined view draws its data from two or three different tables and
presents the query results as a single virtual table. Once the view is
defined, one can use a single table query against the view for the requests
that would otherwise each require a two or three table join.
Advantages of views
Security: security is provided to the data base to the user to a specific no.
of rows of a table.
Query simplicity: by using joined views data can be accessed from
different tables.
Data integrity: if data is accessed and entered through a view, the DBMS
can automatically check the data to ensure that it meets specified integrity
constraints.
62. Disadvantages of views
Performance:
The DBMS the query against the view into queries against the underlying
source table. If a table is defined by a multi table query, then even a simple
query against a view becomes a complicated join, and it may take a long time to
complete. This is reference to insert, delete and update operations
Update restrictions:
When a user tries to update rows of a view, the DBMS must translate the
request into an update into an update on rows of the underlying source table.
This is possible for simple views, but more complicated views cannot be
updated.
63. DCL: Granting and Revoking Privileges.
DCL commands are used to enforce database security in a multiple user
database environment. Two types of DCL commands are GRANT and
REVOKE. Only Database Administrator's or owner's of the database object
can provide/remove privileges on a database object.
SQL GRANT Command
SQL GRANT is a command used to provide access or privileges on the
database objects to the users.
The Syntax for the GRANT command is:
64. Privilege _ name is the access right or privilege granted to the user. Some of the access rights
are ALL, EXECUTE, and SELECT.
Object _ name is the name of an database object like TABLE, VIEW, STORED PROC and
SEQUENCE.
User _ name is the name of the user to whom an access right is being granted.
User _ name is the name of the user to whom an access right is being granted.
PUBLIC is used to grant access rights to all users.
ROLES are a set of privileges grouped together.
WITH GRANT OPTION - allows a user to grant access rights to other users.
65. For Example:
GRANT SELECT ON employee TO user1;This command grants a SELECT permission on employee table to user1.You should use the
WITH GRANT option carefully because for example if you GRANT SELECT privilege on employee table to user1 using the WITH
GRANT option, then user1 can GRANT SELECT privilege on employee table to another user, such as user2 etc. Later, if you
REVOKE the SELECT privilege on employee from user1, still user2 will have SELECT privilege on employee table.
SQL REVOKE Command:
The REVOKE command removes user access rights or privileges to the database objects.
The Syntax for the REVOKE command is:
For Example:
REVOKE SELECT ON employee FROM user1;This command will REVOKE a SELECT privilege on employee table from user1.When
you REVOKE SELECT privilege on a table from a user, the user will not be able to SELECT data from that table anymore. However, if
the user has received SELECT privileges on that table from more than one users, he/she can SELECT from that table until everyone
who granted the permission revokes it. You cannot REVOKE privileges if they were not initially granted by you.
66. Privileges and Roles:
Privileges: Privileges defines the access rights provided to a user on a database object. There are
two types of privileges.
System privileges - This allows the user to CREATE, ALTER, or DROP database objects.
Object privileges - This allows the user to EXECUTE, SELECT, INSERT, UPDATE, or DELETE data from
database objects to which the privileges apply.
Few CREATE system privileges are listed below:
System Privileges Description
CREATE object
allows users to create the specified object in their own
schema.
CREATE ANY object
allows users to create the specified object in any
schema.
The above rules also apply for ALTER and DROP system privileges.
Few of the object privileges are listed below:
Object Privileges Description
INSERT allows users to insert rows into a table.
SELECT allows users to select data from a database object.
UPDATE allows user to update data in a table.
EXECUTE allows user to execute a stored procedure or a function.
67. Limitations of SQL query creation
Limitations of SQL query creation.
A query cannot be created using a view that is derived from a user-defined function. This
is a known limitation.
Incorrect SQL is generated for a multitable data graph.
Multitable graphs are not supported on Informix® Dynamic Server 9.3.
An error occurs when running an SQL file for a second time on Sybase 12 database. If you
are running an SQL file on Sybase, change the Sybase SET CHAINED option to OFF.
68. SQL Language Limitations
SQLFire has limitations and restrictions for SQL statements, clauses, and expressions.
ALTER TABLE Limitations
This release of SQLFire has the following restrictions for ALTER TABLE. SQLFire throws a
SQLException “Feature not implemented” with SQLState “0A000” if any of these actions are
attempted:
Adding or dropping a column when the table has data, or when the table had data at some point
after creation.
Dropping a primary key column with or without data.
Adding or dropping a primary key constraint when the table has data, or when the table had
data at some point after creation.
In addition, the ALTER COLUMN clause as in the SQL-92 standard is not implemented and SQLFire will throw
an SQLException with state “0A000” though it is not treated as a syntactical error.
69. Auto-Generated Columns
This release of SQLFire supports auto-generated IDENTITY columns, but has the following
limitations:
Only INT and BIGINT column types can be marked as auto-generated IDENTITY columns
The START WITH and INCREMENT BY clauses are supported only for GENERATED BY
DEFAULT identity columns.
If the maximum permissible value for the type is reached in any insert, then SQLFire throws an
overflow exception (SQLState: “42Z24”). This does not necessarily mean that all possible values
of that type have been used up, because it is possible that some values remain unused.
Applications should not depend on identity values being incremental across the distributed
system, because SQLFire provides no ordering guarantee for concurrent inserts from multiple
members. However, inserts from a single member will have the generated values in ascending
order and applications can use that for ordering purposes.
70. LONG/LOB Column Restrictions
SQLFire does not support using columns of the following data types in indexes, ORDER BY
clauses, GROUP BY clauses, DISTINCT clauses, UNION clauses, or other set operations:
BLOB
CLOB
LONG VARCHAR FOR BIT DATA
Columns of type LONG VARCHAR are supported in these cases.
Bulk Update Limitations
If a SQL statement performs a bulk update operation on multiple SQLFire members, any
exception that occurs during the bulk update can leave some rows updated while other rows
are not updated. Use transactions with bulk update statements to ensure that all updates
succeed or roll back as a whole
71. Cascade DELETE Not Supported
SQLFire does not support cascade delete operations.
Locking Prioritizes DML over DDL
The SQLFire locking behavior prioritizes DML execution over DDL statements. DDL statments
may receive a lock timeout exception (SQLState: 40XL1) if your system is processing numerous
concurrent DML and DDL statements. You can configure the maximum amount of time that DDL
statements wait for locks using sqlfire.max-lock-wait.
Expiration and Eviction Limitations
EXPIRE ENTRY WITH IDLETIME works only when a primary key based query is fired.
Otherwise the system will not modify its accessed time when table scans or index scans happen
and it gets destroyed.
EXPIRATION or EVICTION with action as DESTROY should not be set on a parent table having
child tables with foreign key reference to it. This is due to a lack of cascade delete support in
SQLFire. If an attempt is made to create a child table having foreign key reference to a table with
such a policy then a SQLException is thrown (SQLState: "X0Y99").
72. INSERT with subselect
SQLFire has a limited support for INSERT statements that use a subselect statement. Nested
selects and selects having aggregates are not supported; these queries throw a feature not
implemented exception (SQLSTATE 0A000).
LOCK TABLE
The LOCK TABLE statement is not supported in this release of SQLFire.
Procedure Invocation (Data-Aware and Non-Data-Aware Procedures)
When you use the ON TABLE extension in a CALL statement, the WHERE clause is
mandatory. If you need to route a data-aware procedure to all members that host the table
(without any pruning), then you must specify some extraneous condition that always evaluates
to true (such as WHERE 1=1).
A server can only handle Java procedure definitions that exactly match the JDBC parameter
types in a CREATE PROCEDURE statement. If a procedure specifies parameter types that use
the base class of a corresponding java type (for example, if a procedure uses java.util.Date
instead of java.sql.Date) then the invocation from the client side fails.
73. UNION, INTERSECT, and EXCEPT Operators
SQLFire does not support any query that has either nested set operators or a set operator with
either a join, function expression, SQL procedure, view, or sub-query. There is no explicit support
provided for ORDER BY, GROUP BY, or complex filters in the WHERE clause in either child of a
query that uses a set operator. Also, transactions and high availability features are not supported
for queries that use a set operator.
In this context, a set operator includes any of these operators: UNION DISTINCT, UNION,
UNION ALL, INTERSECT DISTINCT, INTERSECT, INTERSECT ALL, EXCEPT DISTINCT,
EXCEPT, or EXCEPT ALL.
VIEW Limitations
SQLFire does not support views that involve grouping, aggregate, distinct, or join operations
on a partitioned table.
SQLFire queries have a unique set of capabilities and limitations that are inherent to the
distributed database design.