Database Management Systems (Mcom Ecommerce)

Meaning
A database is an organized collection of data. The data are typically organized to model aspects of reality in a way that
supports processes requiring information. For example, modeling the availability of rooms in hotels in a way that
supports finding a hotel with vacancies.
Database management systems (DBMSs) are specially designed software applications that interact with the user, other
applications, and the database itself to capture and analyze data. A general -purpose DBMS is a software system
designed to allow the definition, creation, querying, update, and administration of databases. Well-known DBMSs
include MySQL, PostgreSQL, Microsoft SQL Server, Oracle, SAP and IBM DB2. A database is not generally portable across
different DBMSs, but different DBMSs can interoperate by using standards such as SQL and ODBC or JDBC to allow a
single application to work with more than one DBMS. Database management systems are often classified according to
the database that they support; the most popular database systems since the 1980s have all supported the relational
model as represented by the SQL language.
Systematically organized or structured repository of indexed information (usually as a group of linked data files) that
allows easy retrieval, updating, analysis, and output of data. Stored usually in a computer, this data could be in
the form of graphics, reports, scripts, tables, text, etc., representing almost every kind of information.
Most computer applications (including software, spreadsheets, word-processors) are databases at their core. See
also flat database and relational database.
A database is a collection of information that is organized so that it can easily be accessed, managed, and updated. In
one view, databases can be classified according to types of content: bibliographic, full -text, numeric, and images. A
database is a collection of information that is organized so that it can easily be accessed, managed, and updated. In one
view, databases can be classified according to types of content: bibliographic, full -text, numeric, and images.
In computing, databases are sometimes classified according to their organizational approach. The most prevalen t
approach is the relational database, a tabular database in which data is defined so that it can be reorganized and
accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among
different points in a network. An object-oriented programming database is one that is congruent with the data defined
in object classes and subclasses.
Computer databases typically contain aggregations of data records or files, such as sales transactions, product catalogs
and inventories, and customer profiles. Typically, a database manager provides users the capabilities of controlling
read/write access, specifying report generation, and analyzing usage. Databases and database managers are prevalent in
large main frame systems, but are also present in smaller distributed workstation and mid-range systems such as the
AS/400 and on personal computers. SQL(Structured Query Language) is a standard language for making interactive
queries from and updating a database such as IBM's DB2, Microsoft's SQL Server, and database products
from Oracle, Sybase, and Computer Associates.
Features of a DBMS
The prime purpose of a relational database management system is to maintain data integrity. This means all the rules
and relationships between data are consistent at all times. But a good DBMS will have other features as well.
These include: A command language that allows you to create, delete and alter the database (data description language
or DDL) A way of documenting all the internal structures that makes up the database (data dictionary) A language to
support the manipulation and processing of the data (data manipulation language) Support the ability to view the
database from different viewpoints according to the requirements of the user Provide some level of security and access
control to the data. The simplest RDBMS may be designed with a single user in mind e.g. the database is 'locked' until
that person has finished with it. Such a RDBMS will only cost a few hundred pounds at most and will have only a basic
capability. On the other hand an enterprise level DBMS can support a huge number of simultaneous users with
thousands of internal tables and complex 'roll back' capabilities should things go wrong.
Obviously this kind of system will cost thousands along with a need to have professional database administrators looking
after it and database specialists to create complex queries for management and staff.
1

1. Controlling Data Redundancy:
In non-database systems (traditional computer file processing), each application program has its own files. In this case,
the duplicated copies of the same data are created at many places. In DBMS, all the data of an organization is integrated
into a single database. The data is recorded at only one place in the database and it is not duplicated. For example, the
dean's faculty file and the faculty payroll file contain several items that are identical. When they are converted into
database, the data is integrated into a single database so that multiple copies of the same data are reduced to-single
copy. In DBMS, the data redundancy can be controlled or reduced but is not removed completely. Sometimes, it is
necessary to create duplicate copies of the same data items in order to relate tables with each other. By controlling
the data redundancy, you can save storage space. Similarly, it is useful for retrieving data from database using queries.
2. Data Consistency:
By controlling the data redundancy, the data consistency is obtained. If a data item appears only once, any update to its
value has to be performed only once and the updated value (new value of item) is immediately available to all users.
If the DBMS has reduced redundancy to a minimum level, the database system enforces consistency. It means that when
a data item appears more than once in the database and is updated, the DBMS automatically updates each occurrence
of a data item in the database.
3. Data Sharing:
In DBMS, data can be shared by authorized users of the organization. The DBA manages the data and gives rights to
users to access the data. Many users can be authorized to access the same set of information simultaneously. The
remote users can also share same data. Similarly, the data of same database can be shared between different
application programs.
4. Data Integration:
In DBMS, data in database is stored in tables. A single database contains multiple tables and relationships can be created
between tables (or associated data entities). This makes easy to retrieve and update data.
5. Integrity Constraints:
Integrity constraints or consistency rules can be applied to database so that the correct data can be entered into
database. The constraints may be applied to data item within a single record or they may be applied to relationships
between records.
Examples:
The examples of integrity constraints are:
(i) 'Issue Date' in a library system cannot be later than the corresponding 'Return Date' of a book.
(ii) Maximum obtained marks in a subject cannot exceed 100.
(iii) Registration number of BCS and MCS students must start with 'BCS' and 'MCS' respectively etc.
There are also some standard constraints that are intrinsic in most of the DBMSs. These are;
2
Constraint Name Description
PRIMARY KEY
Designates a column or combination of columns as Primary Key and
therefore, values of columns cannot be repeated or left blank.
FOREIGN KEY
Relates one table with another table.
UNIQUE
Specifies that values of a column or combination of columns cannot be
repeated.
NOT NULL Specifies that a column cannot contain empty values.
CHECK Specifies a condition which each row of a table must satisfy.
Most of the DBMSs provide the facility for applying the integrity constraints. The database designer (or DBA) identifies
integrity constraints during database design. The application programmer can also identify integrity constraints in the
program code during developing the application program. The integrity constraints are automatically checked at the
time of data entry or when the record is updated. If the data entry operator (end-user) violates an integrity constraint,

the data is not inserted or updated into the database and a message is displayed by the system. For example, when you
draw amount from the bank through ATM card, then your account balance is compared with the amount you are
drawing. If the amount in your account balance is less than the amount you want to draw, then a message is displayed
on the screen to inform you about your account balance.
6. Data Security:
Data security is the protection of the database from unauthorized users. Only the authorized persons are allowed to
access the database. Some of the users may be allowed to access only a part of database i.e., the data that is related to
them or related to their department. Mostly, the DBA or head of a department can access all the data in the database.
Some users may be permitted only to retrieve data, whereas others are allowed to retrieve as well as to update data.
The database access is controlled by the DBA. He creates the accounts of users and gives rights to access the database.
Typically, users or group of users are given usernames protected by passwords.
Most of the DBMSs provide the security sub-system, which the DBA uses to create accounts of users and to specify
account restrictions. The user enters his/her account number (or username) and password to access the data from
database. For example, if you have an account of e-mail in the "hotmail.com" (a popular website), then you have to give
your correct username and password to access your account of e-mail. Similarly, when you insert your ATM card into the
Auto Teller Machine (ATM) in a bank, the machine reads your ID number printed on the card and then asks you to enter
your pin code (or password). In this way, you can access your account.
7. Data Atomicity:
A transaction in commercial databases is referred to as atomic unit of work. For example, when you purchase something
from a point of sale (POS) terminal, a number of tasks are performed such as;
Company stock is updated.
Amount is added in company's account.
Sales person's commission increases etc.
All these tasks collectively are called an atomic unit of work or transaction. These tasks must be completed in all;
otherwise partially completed tasks are rolled back. Thus through DBMS, it is ensured that only consistent data exists
within the database.
8. Database Access Language:
Most of the DBMSs provide SQL as standard database access language. It is used to access data from multiple tables of a
database.
9. Development of Application:
The cost and time for developing new applications is also reduced. The DBMS provides tools that can be used to develop
application programs. For example, some wizards are available to generate Forms and Reports. Stored procedures
(stored on server side) also reduce the size of application programs.
10. Creating Forms:
Form is very important object of DBMS. You can create Forms very easily and quickly in DBMS, Once a Form is created, it
can be used many times and it can be modified very easily. The created Forms are also saved along with database and
behave like a software component. A Form provides very easy way (user-friendly interface) to enter data into database,
edit data, and display data from database. The non-technical users can also perform various operations on databases
through Forms without going into the technical details of a database.
11. Report Writers:
Most of the DBMSs provide the report writer tools used to create reports. The users can create reports very easily and
quickly. Once a report is created, it can be used many times and it can be modified very easily. The created re ports are
also saved along with database and behave like a software component.
12. Control Over Concurrency:
In a computer file-based system, if two users are allowed to access data simultaneously, it is possible that they will
interfere with each other. For example, if both users attempt to perform update operation on the same record, then one
may overwrite the values recorded by the other. Most DBMSs have sub-systems to control the concurrency so that
transactions are always recorded" with accuracy.
3

13. Backup and Recovery Procedures:
In a computer file-based system, the user creates the backup of data regularly to protect the valuable data from
damaging due to failures to the computer system or application program. It is a time consuming method, if volume of
data is large. Most of the DBMSs provide the 'backup and recovery' sub-systems that automatically create the backup of
data and restore data if required. For example, if the computer system fails in the middle (or end) of an update
operation of the program, the recovery sub-system is responsible for making sure that the database is restored to the
state it was in before the program started executing.
14. Data Independence:
The separation of data structure of database from the application program that is used to access data from database is
called data independence. In DBMS, database and application programs are separated from each other. The DBMS sits
in between them. You can easily change the structure of database without modifying the application program. For
example you can modify the size or data type of a data items (fields of a database table). On the other hand, in
computer file-based system, the structure of data items are built into the individual application programs. Thus the data
is dependent on the data file and vice versa.
15. Advanced Capabilities:
DBMS also provides advance capabilities for online access and reporting of data through Internet. Today, most of the
database systems are online. The database technology is used in conjunction with Internet technology to access data on
the web servers.
Data Base Management Systems Architecture
Data Base Management Systems (DBMS) are very relevant in today’s world where information matters. Most business
operations of large companies are dependent on their databases in some way or the other. Many companies use their
data analysis methods to leverage the data in their databases and provide better service to customers and compete
with their business rivals. Databases are collections of data that has been organized in a certain way. The term DBMS
is a commonly used to refer to computer program that can help you store, change and retrieve the data in your
database. Most DBMS software products use SQL as the main query language – the language that lets you interact
with and extract results from your database quickly. SQL is the language used to query popular database systems like
Oracle, SQL Server and MySQL. Learning SQL and DBMS can help you become a database administrator.
DBMS Architecture
DBMS architecture is the way in which the data in a database is viewed (or represented to) by users. It helps you
represent your data in an understandable way to the users, by hiding the complex bits that deal with the working of
the system. Remember, DBMS architecture is not about how the DBMS software operates or how it processes data.
We’re going to take a look at the ANSI-SPARC DBMS standard model. ANSI is the acronym for American National
Standards Institute. It sets standards for American goods so that they can be used anywhere in the world without
compatibility problems. In the case of DBMS software, ANSI has standardized SQL, so that most DBMS products use
SQL as the main query language. The ANSI has also standardized a three level DBMS architecture model followed by
most database systems, and it’s known as the abstract ANSI-SPARC design standard.
The ANSI-SPARC Database Architecture is set up into three tiers. Let’s take a closer look at them.
The Internal Level (Physical Representation of Data) : The internal level is the lowest level in a three tiered database.
This level deals with how the stored data on the database is represented to the user. This level shows exactly how the
data is stored and organized for access on your system. This is the most technical of the three levels. However, the
internal level view is still abstract –even if it shows how the data is stored physically, it will not show how the
database software operates on it. So how exactly is data stored on this level? There are several considerations to be
made when storing data. Some of them include figuring out the right space allocation techniques, data compression
techniques (if necessary), security and encryption and the access paths the software can take to retrieve the data.
Most DBMS software products make sure that data access is optimized and that data uses minimum storage space.
The OS you’re running is actually in charge of managing the physical storage space.
4

The Conceptual Level (Holistic Representation of Data) : The conceptual level tells you how the database was
structured logically. This level tells you about the relationship between the data members of your database, exactly
what data is stored in it and what a user will need to use the database. This level does not concern itself with how this
logical structure will actually be implemented. It’s actually an overview of your database. The conceptual level acts as
a sort of a buffer between the internal level and the external level. It helps hide the complexity of the database and
hides how the data is physically stored in it. The database administrator will have to be conversant with this layer,
because most of his operations are carried out on it. Only a database administrator is allowed to modify or structure
this level. It provides a global view of the database, as well as the hardware and software necessary for running it – all
important info for a database admin.
The External Level (User Representation of Data) : This is the uppermost level in the database. It implements the
concept of abstraction as much as possible. This level is also known as the view level because it deals with how a user
views your database. The external level is what allows a user to access a customized version of the data in your
database. Multiple users can work on a database on the same time because of it. The external level also hides the
working of the database from your users. It maintains the security of the database by giving users access only to the
data which they need at a particular time. Any data that is not needed will not be displayed. Three “schemas”
(internal, conceptual and external) show how the database is internally and externally structured, and so this type of
database architecture is also known as the “three -schema” architecture.
Functional dependency
A functional dependency occurs when one attribute in a relation uniquely determines another attribute. This can be
written A -> B which would be the same as stating "B is functionally dependent upon A."
Examples: In a table listing employee characteristic including Social Security Number (SSN) and name, it can be said that
name is functionally dependent upon SSN (or SSN -> name) because an employee's name can be uniquely determined
from their SSN. However, the reverse statement (name -> SSN) is not true because more than one employee can have
the same name but different SSNs.
Definition - What does Functional Dependency mean?
Functional dependency is a relationship that exists when one attribute uniquely determines another att ribute. If R is a
relation with attributes X and Y, a functional dependency between the attributes is represented as X->Y, which specifies
Y is functionally dependent on X. Here X is a determinant set and Y is a dependent attribute. Each value of X is associated
precisely with one Y value. Functional dependency in a database serves as a constraint between two sets of attributes.
Defining functional dependency is an important part of relational database design and contributes to aspect
normalization.
What is Normalization?
Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization
process: eliminating redundant (unwanted) data (for example, storing the same data in more than one table) and
ensuring dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce
the amount of space a database consumes and ensure that data is logically stored.
Techopedia - Normalization is the process of reorganizing data in a database so that it meets two basic requirements: (1)
There is no redundancy of data (all data is stored in only one place), and (2) data dependencies are logical (all related
data items are stored together). Normalization is important for many reasons, but chiefly because it allows databases to
take up as little disk space as possible, resulting in increased performance. Normalization is also known as data
normalization.
The Normal Forms
The database community has developed a series of guidelines for ensuring that databases are normalized. These are
referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal
form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll of ten see 1NF , 2NF ,
5

and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen and won't be discussed in this article.
Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guideline s
only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when
variations take place, it's extremely important to evaluate any possible ramifications they could have on your system
and account for possible inconsistencies. That said, let's explore the normal forms.
First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database: Eliminate duplicative columns from the
same table. Create separate tables for each group of related data and identify each row with a unique column or set of
columns (the primary key).
Second Normal Form (2NF)
Second normal form (2NF) further addresses the concept of removing duplicative data: Meet all the requirements of the
first normal form. Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
Create relationships between these new tables and their predecessors through the use of foreign keys.
Third Normal Form (3NF)
Third normal form (3NF) goes one large step further: Meet all the requirements of the second normal form.
Remove columns that are not dependent upon the primary key.
Boyce-Codd Normal Form (BCNF or 3.5NF)
The Boyce-Codd Normal Form also referred to as the "third and half (3.5) normal form", adds one more requirement:
Meet all the requirements of the third normal form. Every determinant must be a candidate key.
Fourth Normal Form (4NF)
Finally, fourth normal form (4NF) has one additional requirement: Meet all the requirements of the third normal form.
A relation is in 4NF if it has no multi-valued dependencies. Remember, these normalization guidelines are cumulative.
For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database.
Data Models
E- R Model is a graphical representation of entities and their relationships to each other, typically used in computing in
regard to the organization of data within databases or information systems. An entity is a piece of data-an object or
concept about which data is stored.
A relationship is how the data is shared between entities. There are three types of relationships between entities:
1. One-to-One
One instance of an entity (A) is associated with one other instance of another entity (B). For example, in a database of
employees, each employee name (A) is associated with only one social security number (B).
2. One-to-Many
One instance of an entity (A) is associated with zero, one or many instances of another entity (B), but for one instance of
entity B there is only one instance of entity A. For example, for a company with all employees working in one building,
the building name (A) is associated with many different employees (B), but those employees all share the same singular
association with entity A.
3. Many-to-Many
One instance of an entity (A) is associated with one, zero or many instances of another entity (B), and one instance of
entity B is associated with one, zero or many instances of entity A. For example, for a company in which all of its
6

employees work on multiple projects, each instance of an employee (A) is associated with many instances of a project
(B), and at the same time, each instance of a project (B) has multiple employees (A) associated with it.
Relational Model
The relational model for database management is a database model based on first-order predicate logic, first formulated
and proposed in 1969 by Edgar F. Codd. In the relational model of a database, all data is represented in terms of tuples,
grouped into relations. A database organized in terms of the relational model is a relational database.
The purpose of the relational model is to provide a declarative method for specifying data and queries: users directly
state what information the database contains and what information they want from it, and let the database
management system software take care of describing data structures for storing the data and retrieval procedures for
answering queries.
Most relational databases use the SQL data definition and query language; these systems implement what can be
regarded as an engineering approximation to the relational model. A table in an SQL database schema corresponds to a
predicate variable; the contents of a table to a relation; key constraints, other constraints, and SQL queries correspond
to predicates. However, SQL databases deviate from the relational model in many details, and Codd fiercely argued
against deviations that compromise the original principles.
7
Diagram of an example database according to the Relational model.

8
In the relational model, related records are linked together with a "key".
Network model
The Network model replaces the hierarchical tree with a graph thus allowing more general connections among the
nodes. The main difference of the network model from the hierarchical model, is its ability to handle many to many (N:
N) relations. In other words, it allows a record to have more than one parent. Suppose an employee works for two
departments. The strict hierarchical arrangement is not possible here and the tree becomes a more generalized graph -
a network. The network model was evolved to specifically handle non-hierarchical relationships. As shown below data
can belong to more than one parent. Note that there are lateral connections as well as top-down connections. A
network structure thus allows 1:1 (one: one), l: M (one: many), M: M (many: many) relationships among entities. In
network database terminology, a relationship is a set. Each set is made up of at least two types of records: an owner
record (equivalent to parent in the hierarchical model) and a member record (similar to the child record in the
hierarchical model). The database of Customer-Loan, which we discussed earlier for hierarchical model, is now
represented for Network model as shown. It can easily depict that now the information about the joint loan L1 appears
single time, but in case of hierarchical model it appears for two times. Thus, it reduces the redundancy and is better as
compared to hierarchical model.
Hierarchical Model
The Hierarchical Data Model is a way of organizing a database with multiple one to many relationships. The structure is
based on the rule that one parent can have many children but children are allowed only one parent. This structure
allows information to be repeated through the parent child relations created by IBM and was implemented mainly in
their Information Management System. (IMF), the precursor to the DBMS.
A hierarchical database model is a data model in which the data is organized into a tree-like structure. The data is stored
as records which are connected to one another through links. A record is a collection of fields, with each field containing
only one value. The entity type of a record defines which fields the record contains.
A record in the hierarchical database model corresponds to a row (or tuple) in the relational database model and an
entity type corresponds to a table (or relation). The hierarchical database model mandates that each child record has
only one parent, whereas each parent record can have one or more child records. In order to retrieve data from a

hierarchical database the whole tree needs to be traversed starting from the root node. This model is recognized as the
first database model created by IBM in the 1960
Distributed database
A distributed database is a database that is under the control of a central database management system (DBMS) in
which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the
same physical location, or may be dispersed over a network of interconnected computers. Collections of data (e.g. in a
database) can be distributed across multiple physical locations. A distributed database can reside on network servers on
the Internet, on corporate intranets or extranets, or on other company networks. The replication and distribution of
databases improves database performance at end-user worksites.
To ensure that the distributive databases are up to date and current, there are two processes: replication and
duplication. Replication involves using specialized software that looks for changes in the distributive database. Once the
changes have been identified, the replication process makes all the databases look the same. The replication process can
be very complex and time consuming depending on the size and number of the distributive databases. This process can
also require a lot of time and computer resources. Duplication on the other hand is not as complicated. It basically
identifies one database as a master and then duplicates that database. The duplication process is normally done at a set
time after hours. This is to ensure that each distributed location has the same data. In the duplication process, changes
to the master database only are allowed. This is to ensure that local data will not be overwritten. Both of the processes
can keep the data current in all distributive locations.
Besides distributed database replication and fragmentation, there are many other distributed database design
technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These
technologies' implementation can and does depend on the needs of the business and the sensitivity/confidentiality of
the data to be stored in the database, and hence the price the business is wil ling to spend on ensuring data security,
consistency and integrity.
Object oriented database
An object database (also object-oriented database management system) is a database management system in which
information is represented in the form of objects as used in object-oriented programming. Object databases are
different from relational databases which are table-oriented. Object-relational databases are a hybrid of both
approaches. Object databases have been considered since the early 1980s.
Object-oriented database management systems (OODBMSs) combine database capabilities with object-oriented
programming language capabilities. OODBMSs allow object-oriented programmers to develop the product, store them
as objects, and replicate or modify existing objects to make new objects within the OODBMS. Because the database is
9

integrated with the programming language, the programmer can maintain consistency within one environment, in that
both the OODBMS and the programming language will use the same model of representation. Relational DBMS projects,
by way of contrast, maintain a clearer division between the database model and the application.
As the usage of web-based technology increases with the implementation of Intranets and extranets, companies have a
vested interest in OODBMSs to display their complex data. Using a DBMS that has been specifically designed to store
data as objects gives an advantage to those companies that are geared towards multimedia presentation or
organizations that utilize computer-aided design (CAD). Some object-oriented databases are designed to work well
with object-oriented programming languages such as Delphi, Ruby, Python, Perl, Java, C#, Visual Basic
.NET, C++,Objective-C and Smalltalk; others have their own programming languages. OODBMSs use exactly the same
model as object-oriented programming languages.
Spatial database
A spatial database is a database that is optimized to store and query data that represents objects defined in a geometric
space. Most spatial databases allow representing simple geometric objects such as points, li nes and polygons. Some
spatial databases handle more complex structures such as 3D objects, topological coverage’s, linear networks, and TINs.
While typical databases are designed to manage various numeric and character types of data, additional functionality
needs to be added for databases to process spatial data types efficiently. These are typically called geometry or feature.
The Open Geospatial Consortium created the Simple Features specification and sets standards for adding spatial
functionality to database systems.
Multimedia database
A Multimedia database (MMDB) is a collection of related multimedia data. The multimedia data include one or more
primary media data types such
as text, images, graphicobjects (including drawings, sketches and illustrations) animation sequences, audio and video.
A Multimedia Database Management System (MMDBMS) is a framework that manages different types of data
potentially represented in a wide diversity of formats on a wide array of media sources. It provides support for
multimedia data types, and facilitate for creation, storage, access, query and control of a multimedia database.
Crash Recovery System
Though we are living in highly technologically advanced era where hundreds of satellite monitor the earth and at every
second billions of people are connected through information technology, failure is expected but not every time
acceptable.
DBMS is highly complex system with hundreds of transactions being executed every second. Availability of DBMS
depends on its complex architecture and underlying hardware or system software. If it fails or crashes amid transactions
being executed, it is expected that the system would follow some sort of algorithm or techniques to recover from
crashes or failures.
Failure Classification
To see where the problem has occurred we generalize the failure into various categories, as follows:
TRANSACTION FAILURE
When a transaction is failed to execute or it reaches a point after which it cannot be completed successfully it has to
abort. This is called transaction failure. Where only few transaction or process are hurt.
Reason for transaction failure could be:
Logical errors: where a transaction cannot complete because of it has some code error or any internal error condition
System errors: where the database system itself terminates an active transaction because DBMS is not able to execute it
or it has to stop because of some system condition. For example, in case of deadlock or resource unavailability systems
aborts an active transaction.
10

SYSTEM CRASH
There are problems, which are external to the system, which may cause the system to stop abruptly and cause the
system to crash. For example interruption in power supply, failure of underlying hardware or software failure.
Examples may include operating system errors.
DISK FAILURE:
In early days of technology evolution, it was a common problem where hard disk drives or storage drives used to fail
frequently. Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other failu re,
which destroys all or part of disk storage
Storage Structure
We have already described storage system here. In brief, the storage structure can be divided in various categories:
Volatile storage: As name suggests, this storage does not survive system crashes and mostly placed very closed to CPU
by embedding them onto the chipset itself for examples: main memory, cache memory. They are fast but can store a
small amount of information.
Nonvolatile storage: These memories are made to survive system crashes. They are huge in data storage capacity but
slower in accessibility. Examples may include, hard disks, magnetic tapes, flash memory, non-volatile (battery backed up)
RAM.
Recovery and Atomicity
When a system crashes, it many have several transactions being executed and various files opened for them to
modifying data items. As we know that transactions are made of various operations, which are atomic in nature. But
according to ACID properties of DBMS, atomicity of transactions as a whole must be maintained that is, either all
operations are executed or none.
When DBMS recovers from a crash it should maintain the following:
It should check the states of all transactions, which were being executed.
A transaction may be in the middle of some operation; DBMS must ensure the atomicity of transaction in this case.
It should check whether the transaction can be completed now or needs to be rolled back.
No transactions would be allowed to left DBMS in inconsistent state.
There are two types of techniques, which can help DBMS in recovering as well as maintaining the atomicity of
transaction: Maintaining the logs of each transaction, and writing them onto some stable storage before actually
modifying the database. Maintaining shadow paging, where the changes are done on a volatile memory and later the
actual database is updated.
Log-Based Recovery
Log is a sequence of records, which maintains the records of actions performed by a transaction. It is important that the
logs are written prior to actual modification and stored on a stable storage media, which is failsafe.
Log based recovery works as follows:
The log file is kept on stable storage media
When a transaction enters the system and starts execution, it writes a log about it
<Tn, Start>
When the transaction modifies an item X, it write logs as follows:
<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V1 to V2.
When transaction finishes, it logs:
<Tn, commit>
Database can be modified using two approaches:
Deferred database modification: All logs are written on to the stable storage and database is updated when transaction
commits.
11

Immediate database modification: Each log follows an actual database modification. That is, database is modified
immediately after every operation.
Recovery with concurrent transactions
When more than one transaction is being executed in parallel, the logs are interleaved. At the time of recovery it would
become hard for recovery system to backtrack all logs, and then start recovering. To ease this situation most modern
DBMS use the concept of 'checkpoints'.
CHECKPOINT
Keeping and maintaining logs in real time and in real environment may fill out all the memory space available in the
system. At time passes log file may be too big to be handled at all. Checkpoint is a mechanism where all the previous
logs are removed from the system and stored permanently in storage disk. Checkpoint declares a point before which the
DBMS was in consistent state and all the transactions were committed.
RECOVERY
When system with concurrent transaction crashes and recovers, it does behave in the following manner:
[Image: Recovery with concurrent transactions]
The recovery system reads the logs backwards from the end to the last Checkpoint.
It maintains two lists, undo-list and redo-list.
If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it puts the transaction in redo-list.
If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the transaction in undo-list.
All transactions in undo-list are then undone and their logs are removed. All transaction in redo-list, their previous logs
are removed and then redone again and log saved
Database security / Authorization concerns the use of a broad range of information security controls to
protect databases (potentially including the data, the database applications or stored functions, the database systems,
the database servers and the associated network links) against compromises of their confidentiality, integrity and
availability. It involves various types or categories of controls, such as technical, procedural/administrative and
physical. Database security is a specialist topic within the broader realms of computer security, information
security and risk management. Security risks to database systems include, for example:
Unauthorized or unintended activity or misuse by authorized database users, database administrators, or
network/systems managers, or by unauthorized users or hackers (e.g. inappropriate access to sensitive data, metadata
or functions within databases, or inappropriate changes to the database programs, structures or security
configurations);
Malware infections causing incidents such as unauthorized access, leakage or disclosure of personal or proprietary data,
deletion of or damage to the data or programs, interruption or denial of authorized access to the database, attacks on
other systems and the unanticipated failure of database services;
12

Overloads, performance constraints and capacity issues resulting in the inability of authorized users to use databases as
intended;
Physical damage to database servers caused by computer room fires or floods, overheating, lightning, accidental liquid
spills, static discharge, electronic breakdowns/equipment failures and obsolescence;
Design flaws and programming bugs in databases and the associated programs and systems, creating various security
vulnerabilities (e.g. unauthorized privilege escalation), data loss/corruption, performance degradation etc.;
Data corruption and/or loss caused by the entry of invalid data or commands, mistakes in database or system
administration processes, sabotage/criminal damage etc.
Many layers and types of information security control are appropriate to databases, including:
Access control
Auditing
Authentication
Encryption
Integrity controls
Backups
Application security
Database Security applying Statistical Method
Traditionally databases have been largely secured against hackers through network security measures such as firewalls,
and network-based intrusion detection systems. While network security controls remain valuable in this regard, securing
the database systems themselves, and the programs/functions and data within them, has arguably become more critical
as networks are increasingly opened to wider access, in particular access from the Internet. Furthermore, system,
program, function and data access controls, along with the associated user identification, authentication and rights
management functions, have always been important to limit and in some cases log the activities of authorized users and
administrators. In other words, these are complementary approaches to database security, working from both the
outside-in and the inside-out as it were.
Many organizations develop their own "baseline" security standards and designs detailing basic security control
measures for their database systems. These may reflect general information security requirements or obligations
imposed by corporate information security policies and applicable laws and regulations (e.g. concerning privacy,
financial management and reporting systems), along with generally accepted good database security practices (such as
appropriate hardening of the underlying systems) and perhaps security recommendations from the relevant database
system and software vendors. The security designs for specific database systems typically specify further security
administration and management functions (such as administration and reporting of user access rights, log management
and analysis, database replication/synchronization and backups) along with various business-driven information security
controls within the database programs and functions (e.g. data entry validation and audit trails). Furthermore, various
security-related activities (manual controls) are normally incorporated into the procedures, guidelines etc. relating to
the design, development, configuration, use, management and maintenance of databases.
Data Warehouse Architecture
Different data warehousing systems have different structures. Some may have an ODS (operational data store), while
some may have multiple data marts. Some may have a small number of data sources, while some may have dozens of
data sources. In view of this, it is far more reasonable to present the different layers of a data warehouse architecture
rather than discussing the specifics of any one system. In general, al l data warehouse systems have the following layers:
 Data Source Layer
 Data Extraction Layer
 Staging Area
 ETL Layer
 Data Storage Layer
13

14
 Data Logic Layer
 Data Presentation Layer
 Metadata Layer
 System Operations Layer
Data Source Layer
This represents the different data sources that feed data into the data warehouse. The data source can be of any format
-- plain text file, relational database, other types of database, Excel file, etc., can all act as a data source.
Many different types of data can be a data source:
-- such as sales data, HR data, product data, inventory data, marketing data, systems data.
-party data, such as census data, demographics data, or survey data.
All these data sources together form the Data Source Layer.
Data Extraction Layer
Data gets pulled from the data source into the data warehouse system. There is likely some minimal data cleansing, but
there is unlikely any major data transformation.
Staging Area
This is where data sits prior to being scrubbed and transformed into a data warehouse / data mart. Having one common
area makes it easier for subsequent data processing / integration.
ETL Layer
This is where data gains its "intelligence", as logic is applied to transform the data from a transactional nature to an
analytical nature. This layer is also where data cleansing happens. The ETL design phase is often the most time-consuming
phase in a data warehousing project, and an ETL tool is often used in this layer.
Data Storage Layer
This is where the transformed and cleansed data sit. Based on scope and functionality, 3 types of entities can be found
here: data warehouse, data mart, and operational data store (ODS). In any given system, you may have just one of the
three, two of the three, or all three types.
Data Logic Layer
This is where business rules are stored. Business rules stored here do not affect the underlying data transformation
rules, but do affect what the report looks like.
Data Presentation Layer
This refers to the information that reaches the users. This can be in a form of a tabular / graphical report in a browser,
an emailed report that gets automatically generated and sent every day, or an alert that warns users of exceptions,
among others. Usually an tool and/or a reporting tool is used in this layer.
Metadata Layer
This is where information about the data stored in the data warehouse system is stored. A logical data model would be
an example of something that's in the metadata layer. Ametadata tool is often used to manage metadata.
System Operations Layer
This layer includes information on how the data warehouse system operates, such as ETL job status, system
performance, and user access history.
Evolution of data warehousing
In the 1990's as organizations of scale began to need more timely data about their business, they found that traditional
information systems technology was simply too cumbersome to provide relevant data efficiently and quickly.
Completing reporting requests could take days or weeks using antiquated reporting tools that were designed more or

less to 'execute' the business rather than 'run' the business.
From this idea, the data warehouse was born as a place where relevant data could be held for completing s trategic
reports for management. The key here is the word 'strategic' as most executives were less concerned with the day to
day operations than they were with a more overall look at the model and business functions.
As with all technology, over the course of the latter half of the 20th century, we saw increased numbers and types of
databases. Many large businesses found themselves with data scattered across multiple platforms and variations of
technology, making it almost impossible for any one individual to use data from multiple sources. A key idea within data
warehousing is to take data from multiple platforms/technologies (As varied as spreadsheets, DB2 databases, IDMS
records, and VSAM files) and place them in a common location that uses a common querying tool. In this way
operational databases could be held on whatever system was most efficient for the operational business, while the
reporting / strategic information could be held in a common location using a common language. Data Warehouses take
this even a step farther by giving the data itself commonality by defining what each term means and keeping it standard.
(An example of this would be gender which can be referred to in many ways, but should be standardized on a data
warehouse with one common way of referring to each sex).
All of this was designed to make decision support more readily available and without affecting day to day operations.
One aspect of a data warehouse that should be stressed is that it is NOT a location for ALL of a businesses data, but
rather a location for data that is 'interesting'. Data that is interesting will assist decision makers in making strategic
decisions relative to the organization's overall mission.
Benefits of Data Warehousing
The successful implementation of a data warehouse can bring major, benefits to an organization including:
• Potential high returns on investment - Implementation of data warehousing by an organization requires a huge
investment typically from Rs 10 lac to 50 lacs. However, a study by the International Data Corporation (IDC) in 1996
reported that average three-year returns on investment (RO I) in data warehousing reached 401%.
• Competitive advantage - The huge returns on investment for those companies that have successfully implemented a
data warehouse is evidence of the enormous competitive advantage that accompanies this technology. The competitive
advantage is gained by allowing decision-makers access to data that can reveal previously unavailable, unknown, and
untapped information on, for example, customers, trends, and demands.
• Increased productivity of corporate decision-makers - Data warehousing improves the productivity of corporate
decision-makers by creating an integrated database of consistent, subject-oriented, historical data. It integrates data
from multiple incompatible systems into a form that provides one consistent view of the organization. By transforming
data into meaningful information, a data warehouse allows business managers to perform more substantive, accurate,
and consistent analysis.
• More cost-effective decision-making - Data warehousing helps to reduce the overall cost of the· product· by reducing
the number of channels.
• Better enterprise intelligence - It helps to provide better enterprise intelligence.
• Enhanced customer service.
• It is used to enhance customer" service.
Problems of Data Warehousing
The problems associated with developing and managing a data warehousing are as follows:
Underestimation of resources of data loading - Sometimes we underestimate the time required to extract, clean, and
load the data into the warehouse. It may take the significant proportion of the total development time, although some
tools are there which are used to reduce the time and effort spent on this process.
Hidden problems with source systems - Sometimes hidden .problems associated with the source systems feeding the
data warehouse may be identified after years of being undetected. For example, when entering the details of a new
15

property, certain fields may allow nulls which may result in staff entering incomplete property data, even when available
and applicable.
Required data not captured - In some cases the required data is not captured by the source systems which may be very
important for the data warehouse purpose. For example the date of registration for the property may be not used in
source system but it may be very important analysis purpose.
Increased end-user demands - After satisfying some of end-users queries, requests for support from staff may increase
rather than decrease. This is caused by an increasing awareness of the users on the capabilities and value of the data
warehouse. Another reason for increasing demands is that once a data warehouse is online, it is often the case that the
number of users and queries increase together with requests for answers to more and more complex queries.
Data homogenization - The concept of data warehouse deals with similarity of data formats between different data
sources. Thus, results in to lose of some important value of the data.
High demand for resources - The data warehouse requires large amounts of data.
Data ownership - Data warehousing may change the attitude of end-users to the ownership of data. Sensitive data that
owned by one department has to be loaded in data warehouse for decision making purpose. But some time it results in
to reluctance of that department because it may hesitate to share it with others.
High maintenance - Data warehouses are high maintenance systems. Any reorganization· of the business processes and
the source systems may affect the data warehouse and it results high maintenance cost.
Long-duration projects - The building of a warehouse can take up to three years, which is why some organizations are
reluctant in investigating in to data warehouse. Some only the historical data of a particular department is captured in
the data warehouse resulting data marts. Data marts support only the requirements of a particular department and
limited the functionality to that department or area only.
Complexity of integration - The most important area for the management of a data warehouse is the integration
capabilities. An organization must spend a significant amount of time determining how well the various different data
warehousing tools can be integrated into the overall solution that is needed. This can be a very difficult task, as there are
a number of tools for every operation of the data warehouse.
Data mining
A process used by companies to turn raw data into useful information. By using software to look for patterns in large
batches of data, businesses can learn more about their customers and develop more effective marketing strategies as
well as increase sales and decrease costs. Data mining depends on effective data collection and warehousing as well as
computer processing. Grocery stores are well -known users of data mining techniques. Many supermarkets offer free
loyalty cards to customers that give them access to reduced prices not available to non-members. The cards make it
easy for stores to track who is buying what, when they are buying it, and at what price. The stores can then use this
data, after analyzing it, for multiple purposes, such as offering customers coupons that are targeted to their buying
habits and deciding when to put items on sale and when to sell them at full price.
Data mining can be a cause for concern when only selected information, which is not representative of the overall
sample group, is used to prove a certain hypothesis.
Data mining process
Cross-Industry Standard Process for Data Mining (CRISP-DM) consists of six phases intended as a cyclical process as the
following figure:
16

17
Cross-Industry Standard Process for Data Mining (CRISP-DM)
Business understanding
In the business understanding phase: First, it is required to understand business objectives clearly and find out what are
the business’s needs.
Next, we have to assess the current situation by finding about the resources, assumptions, constraints and other
important factors which should be considered. Then, from the business objectives and current situations, we need to
create data mining goals to achieve the business objectives within the current situation. Finally, a good data mining plan
has to be established to achieve both business and data mining goals. The plan should be as detailed as possible.
Data understanding
First, the data understanding phase starts with initial data collection, which we collect from available data sources, to
help us get familiar with the data. Some important activities must be performed including data load and data integration
in order to make the data collection successfully. Next, the “gross” or “surface” properties of acquired data needs to be
examined carefully and reported. Then, the data needs to be explored by tackling the data mining questions, which can
be addressed using querying, reporting and visualization. Finally, the data quality must be examined by answering some
important questions such as “Is the acquired data complete?”, “Is there any missing values in the acquired data?”
Data preparation
The data preparation typically consumes about 90% of the time of the project. The outcome of the data preparation
phase is the final data set. Once available data sources are identified, they need to be selected, cleaned, constructed and
formatted into the desired form. The data exploration task at a greater depth may be carried during this phase to notice
the patterns based on business understanding.
Modeling
First, modeling techniques have to be selected to be used for the prepared dataset.
Next, the test scenario must be generated to validate the quality and validity of the model.
Then, one or more models are created by running the modeling tool on the prepared dataset.
Finally, models need to be assessed carefully involving stakeholders to make sure that created models are met business
initiatives.
Evaluation

In the evaluation phase, the model results must be evaluated in the context of business objectives in the first phase. In
this phase, new business requirements may be raised due to the new patterns that has been discovered in the model
results or from other factors. Gaining business understanding is an iterative process in data mining. The go or no -go
decision must be made in this step to move to the deployment phase.
Deployment
The knowledge or information, which we gain through data mining process, needs to be presented in such a way that
stakeholders can use it when they want it. Based on the business requirements, the deployment phase could be as
simple as creating a report or as complex as a repeatable data mining process across the organization. In the
deployment phase, the plans for deployment, maintenance and monitoring have to be created for implementation and
also future supports. From the project point of view, the final report of the project needs to summary the project
experiences and review the project to see what need to improved created learned lessons.
Data mining techniques
Association
Association (or relation) is probably the better known and most familiar and straightforward data mining technique.
Here, you make a simple correlation between two or more items, often of the same type to identify patterns. For
example, when tracking people's buying habits, you might identify that a customer always buys cream when they buy
strawberries, and therefore suggest that the next time that they buy strawberries they might also want to buy cream.
Classification
You can use classification to build up an idea of the type of customer, item, or object by describing multiple attributes to
identify a particular class. For example, you can easily classify cars into different types (sedan, 4x4, convertible) by
identifying different attributes (number of seats, car shape, driven wheels). Given a new car, you might apply it into a
particular class by comparing the attributes with our known definition. You can apply the same principles to customers,
for example by classifying them by age and social group.
Clustering
By examining one or more attributes or classes, you can group individual pieces of data together to form a structure
opinion. At a simple level, clustering is using one or more attributes as your basis for identifying a cluster of correlating
results. Clustering is useful to identify different information because it correlates with other examples so you can see
where the similarities and ranges agree. Clustering can work both ways. You can assume that there is a cluster at certain
point and then use our identification criteria to see if you are correct. In this example, a sample of sales data compares
the age of the customer to the size of the sale. It is not unreasonable to expect that people in their twenties (before
marriage and kids), fifties, and sixties (when the children have left home), have more disposable income.
Prediction
Prediction is a wide topic and runs from predicting the failure of components or machinery, to identifying fraud and
even the prediction of company profits. Used in combination with the other data mining techniques, prediction involves
analyzing trends, classification, pattern matching, and relation. By analyzing past events or instances, you can make a
prediction about an event. Using the credit card authorization, for example, you might combine decision tree analysis of
individual past transactions with classification and historical pattern matches to identify whether a transaction is
fraudulent. Making a match between the purchase of flights to the US and transactions in the US, it is likely that the
transaction is valid.
Sequential patterns
Often used over longer-term data, sequential patterns are a useful method for identifying trends, or regular occurrences
of similar events. For example, with customer data you can identify that customers buy a particular collection of
products together at different times of the year. In a shopping basket application, you can use this information to
automatically suggest that certain items be added to a basket based on their frequency and past purchasing history.
Decision trees
18

Related to most of the other techniques (primarily classification and prediction), the decision tree can be used either as
a part of the selection criteria, or to support the use and selection of specific data within the overall structure. Within
the decision tree, you start with a simple question that has two (or sometimes more) answers. Each answer leads to a
further question to help classify or identify the data so that it can be categorized, or so that a prediction can be made
based on each answer.
Decision tree
Combinations
In practice, it's very rare that you would use one of these exclusively. Classification and clustering are similar techniques.
By using clustering to identify nearest neighbors, you can further refine your classifications. Often, we use decision trees
to help build and identify classifications that we can track for a longer period to identify sequences and patterns.
Long-term (memory) processing
Within all of the core methods, there is often reason to record and learn from the information. In some techniques, it is
entirely obvious. For example, with sequential patterns and predictive learning you look back at data from multiple
sources and instances of information to build a pattern. In others, the process might be more explicit. Decision trees are
rarely built one time and are never forgotten. As new information, events, and data points are identified, it might be
necessary to build more branches, or even entirely new trees, to cope with the additional information. You can
automate some of this process. For example, building a predictive model for identifying credit card fraud is about
building probabilities that you can use for the current transaction, and then updating that model with the new
(approved) transaction. This information is then recorded so that the decision can be made quickly the next time.
Ecommerce & Web application security Issues
1. Introduction
E-commerce is defined as the buying and selling of products or services over electronic systems such as the Internet and
to a lesser extent, other computer networks. It is generally regarded as the sales and commercial function of eBusiness.
There has been a massive increase in the level of trade conducted electronically since the widespread penetration of the
Internet. A wide variety of commerce is conducted via eCommerce, including electronic funds transfer, supply chain
management, Internet marketing, online transaction processing, electronic data interchange (EDI), inventory
management systems, and automated data collection systems. US online retail sales reached $175 billion in 2007 and
are projected to grow to $335 billion by 2012 (Mulpuru, 2008).
This massive increase in the uptake of eCommerce has led to a new generation of associated security threats, but any
eCommerce system must meet four integral requirements:
a) privacy – information exchanged must be kept from unauthorized parties
19

b) integrity – the exchanged information must not be altered or tampered with
c) authentication – both sender and recipient must prove their identities to each other and
d) non-repudiation – proof is required that the exchanged information was indeed received (Holcombe, 2007).
These basic maxims of eCommerce are fundamental to the conduct of secure business onli ne. Further to the
fundamental maxims of eCommerce above, eCommerce providers must also protect against a number of different
external security threats, most notably Denial of Service (DoS). These are where an attempt is made to make a computer
resource unavailable to its intended users through a variety of mechanisms discussed below. The financial services
sector still bears the brunt of e-crime, accounting for 72% of all attacks. But the sector that experienced the greatest
increase in the number of attacks was eCommerce. Attacks in this sector have risen by 15% from 2006 to 2007
(Symantec, 2007).
2. Privacy
Privacy has become a major concern for consumers with the rise of identity theft and impersonation, and any concern
for consumers must be treated as a major concern for eCommerce providers. According to Consumer Reports Money
Adviser (Perrotta, 2008), the US Attorney General has announced multiple indictments relating to a massive
international security breach involving nine major retailers and more than 40 million credit- and debit-card numbers. US
attorneys think that this may be the largest hacking and identity-theft case ever prosecuted by the justice department.
Both EU and US legislation at both the federal and state levels mandates certain organizations to inform customers
about information uses and disclosures. Such disclosures are typically accomplished through privacy policies, both online
and offline (Vail et al., 2008).
In a study by Lauer and Deng (2008), a model is presented linking privacy policy, through trustworthiness, to online
trust, and then to customers’ loyalty and their willingness to provide truthful information. The model was tested using a
sample of 269 responses. The findings suggested that consumers’ trust in a company is close ly linked with the
perception of the company’s respect for customer privacy (Lauer and Deng, 2007). Trust in turn is linked to increased
customer loyalty that can be manifested through increased purchases, openness to trying new products, and willingness
to participate in programs that use additional personal information. Privacy now forms an integral part of any e -
commerce strategy and investment in privacy protection has been shown to increase consumer’s spend, trustworthiness
and loyalty.
The converse of this can be shown to be true when things go wrong. In March 2008, the Irish online jobs board, jobs.ie,
was compromised by criminals and users’ personal data (in the form of CV’s) were taken (Ryan, 2008). Looking at the
real-time responses of users to this event on the popular Irish forum, Boards.ie, we can see that privacy is of major
concern to users and in the event of their privacy being compromised users become very agitated and there is an overall
negative effect on trust in e-commerce. User comments in the forum included: “I’m well p*ssed off about them keeping
my CV on the sly”; “I am just angry that this could have happened and to so many people”; “Mine was taken too. How
do I terminate my acc with jobs.ie”; “Grr, so annoyed, feel I should report i t to the Gardai now” (Boards.ie, 2008).
3. Integrity, Authentication & Non-Repudiation
In any e-commence system the factors of data integrity, customer & client authentication and non-repudiation are
critical to the success of any online business. Data integrity is the assurance that data transmitted is consistent and
correct, that is, it has not been tampered or altered in any way during transmission. Authentication is a means by which
both parties in an online transaction can be confident that they are who they say they are and non-repudiation is the
idea that no party can dispute that an actual event online took place. Proof of data integrity is typically the easiest of
these factors to successfully accomplish. A data hash or checksum, such as MD5 or CRC, is usually sufficient to establish
that the likelihood of data being undetectably changed is extremely low (Schlaeger and Pernul, 2005). Notwithstanding
these security measures, it is still possible to compromise data in transit through techniques such as phishing or man-in-the-
middle attacks (Desmedt, 2005). These flaws have led to the need for the development of strong verification and
security measurements such as digital signatures and public key infrastructures (PKI).
One of the key developments in e-commerce security and one which has led to the widespread growth of e-commerce is
the introduction of digital signatures as a means of verification of data integrity and authentication. In 1995, Utah
became the first jurisdiction in the world to enact an electronic signature law. An electronic signature may be defined as
20

“any letters, characters, or symbols manifested by electronic or similar means and executed or adopted by a party with
the intent to authenticate a writing” (Blythe, 2006). In order for a digital signature to attain the same legal status as an
ink-on-paper signature, asymmetric key cryptology must have been employed in its production (Blythe, 2006). Such a
system employs double keys; one key is used to encrypt the message by the sender, and a different, albeit
mathematically related, key is used by the recipient to decrypt the message (Antoniou et al., 2008). This is a very good
system for electronic transactions, since two stranger-parties, perhaps living far apart, can confirm each other’s identity
and thereby reduce the likelihood of fraud in the transaction. Non-repudiation techniques prevent the sender of a
message from subsequently denying that they sent the message. Digital Signatures using public-key cryptography and
hash functions are the generally accepted means of providing non-repudiation of communications
4. Technical Attacks
Technical attacks are one of the most challenging types of security compromise an e -commerce provider must face.
Perpetrators of technical attacks, and in particular Denial-of-Service attacks, typically target sites or services hosted on
high-profile web servers such as banks, credit card payment gateways, large online retailers and popular social
networking sites.
Denial of Service Attacks
Denial of Service (DoS) attacks consist of overwhelming a server, a network or a website in order to paralyze its normal
activity (Lejeune, 2002). Defending against DoS attacks is one of the most challenging security problems on the Internet
today. A major difficulty in thwarting these attacks is to trace the source of the attack, as they often use incorrect or
spoofed IP source addresses to disguise the true origin of the attack (Kim and Kim, 2006).
The United States Computer Emergency Readiness Team defines symptoms of deni al-of-service attacks to include
(McDowell, 2007):
• Unusually slow network performance
• Unavailability of a particular web site
• Inability to access any web site
• Dramatic increase in the number of spam emails received
DoS attacks can be executed in a number of different ways including:
ICMP Flood (Smurf Attack) – where perpetrators will send large numbers of IP packets with the source address faked to
appear to be the address of the victim. The network’s bandwidth is quick ly used up, preventing legitimate packets from
getting through to their destination
Teardrop Attack – A Teardrop attack involves sending mangled IP fragments with overlapping, over-sized, payloads to
the target machine. A bug in the TCP/IP fragmentation re-assembly code of various operating systems causes the
fragments to be improperly handled, crashing them as a result of this.
Phlashing – Also known as a Permanent denial-of-service (PDoS) is an attack that damages a system so badly that it
requires replacement or reinstallation of hardware. Perpetrators exploit security flaws in the remote management
interfaces of the victim’s hardware, be it routers, printers, or other networking hardware. These flaws leave the door
open for an attacker to remotely ‘update’ the device firmware to a modified, corrupt or defective firmware image,
therefore bricking the device and making it permanently unusable for its original purpose.
Distributed Denial-of-Service Attacks - Distributed Denial of Service (DDoS) attacks are one of the greatest security fear
for IT managers. In a matter of minutes, thousands of vulnerable computers can flood the victim website by choking
legitimate traffic (Tariq et al., 2006). A distributed denial of service attack (DDoS) occurs when multiple compromised
systems flood the bandwidth or resources of a targeted system, usually one or more web servers. The most famous
DDoS attacks occurred in February 2000 where websites including Yahoo, Buy.com, eBay, Amazon and CNN were
attacked and left unreachable for several hours each (Todd, 2000).
Brute Force Attacks – A brute force attack is a method of defeating a cryptographic scheme by trying a large number of
possibilities; for example, a large number of the possible keys in a key space in order to decrypt a message. Brute Force
Attacks, although perceived to be low-tech in nature are not a thing of the past. In May 2007 the internet infrastructure
in Estonia was crippled by multiple sustained brute force attacks against government and commercial institut ions in the
21

country (Sausner, 2008). The attacks followed the relocation of a Soviet World War II memorial in Tallinn in late April
made news around the world.
5. Non-Technical Attacks
Phishing Attacks
Phishing is the criminally fraudulent process of attempting to acquire sensitive information such as usernames,
passwords and credit card details, by masquerading as a trustworthy entity in an electronic communication. Phishing
scams generally are carried out by emailing the victim with a ‘fraudulent’ email from what purports to be a legitimate
organization requesting sensitive information. When the victim follows the link embedded within the email they are
brought to an elaborate and sophisticated duplicate of the legitimate organizations website. Phishi ng attacks generally
target bank customers, online auction sites (such as eBay), online retailers (such as amazon) and services providers (such
as PayPal). According to community banker (Swann, 2008), in more recent times cybercriminals have got more
sophisticated in the timing of their attacks with them posing as charities in times of natural disaster.
Social Engineering
Social engineering is the art of manipulating people into performing actions or divulging confidential information. Social
engineering techniques include pretexting (where the fraudster creates an invented scenario to get the victim to divulge
information), Interactive voice recording (IVR) or phone phishing (where the fraudster gets the victim to divulge
sensitive information over the phone) and baiting with Trojans horses (where the fraudster ‘baits’ the victim to load
malware unto a system). Social engineering has become a serious threat to e-commerce security since it is difficult to
detect and to combat as it involves ‘human’ factors which cannot be patched akin to hardware or software, albeit staff
training and education can somewhat thwart the attack (Hasle et al., 2005).
6. Conclusions
In conclusion the e-commerce industry faces a challenging future in terms of the security risks it must avert. With
increasing technical knowledge, and its widespread availability on the internet, criminals are becoming more and more
sophisticated in the deceptions and attacks they can perform. Novel attack strategies and vulnerabilities only really
become known once a perpetrator has uncovered and exploited them. In saying this, there are multiple security
strategies which any e-commerce provider can instigate to reduce the risk of attack and compromise significantly.
Awareness of the risks and the implementation of multi-layered security protocols, detailed and open privacy policies
and strong authentication and encryption measures will go a long way to assure the consumer and insure the risk of
compromise is kept minimal.
What is MySQL?
22
 MySQL is a database system used on the web
 MySQL is a database system that runs on a server
 MySQL is ideal for both small and large applications
 MySQL is very fast, reliable, and easy to use
 MySQL supports standard SQL
 MySQL compiles on a number of platforms
 MySQL is free to download and use
 MySQL is developed, distributed, and supported by Oracle Corporation
The data in MySQL is stored in tables. A table is a collection of related data, and it consists of columns and
rows. Databases are useful when storing information categorically. A company may have a database with the
following tables:
 Employees
 Products
 Customers
 Orders
What is PHP?
 PHP is an acronym for "PHP Hypertext Preprocessor"
 PHP is a widely-used, open source scripting language

23
 PHP scripts are executed on the server
 PHP costs nothing, it is free to download and use
What is a PHP File?
 PHP files can contain text, HTML, CSS, JavaScript, and PHP code
 PHP code are executed on the server, and the result is returned to the browser as plain HTML
 PHP files have extension ".php"
What Can PHP Do?
 PHP can generate dynamic page content
 PHP can create, open, read, write, delete, and close files on the server
 PHP can collect form data
 PHP can send and receive cookies
 PHP can add, delete, modify data in your database
 PHP can restrict users to access some pages on your website
 PHP can encrypt data
With PHP you are not limited to output HTML. You can output images, PDF files, and even Flash movies. You
can also output any text, such as XHTML and XML.
Connecting to and Disconnecting from the Server
To connect to the server, you will usually need to provide a MySQL user name when you invoke MySQL and, most likely,
a password. If the server runs on a machine other than the one where you log in, you will also need to specify a host
name. Contact your administrator to find out what connection parameters you should use to connect (that is, what host,
user name, and password to use). Once you know the proper parameters, you should be able to connect like this:
shell> mysql -h host -u user -p
Enter password: ********
host and user represent the host name where your MySQL server is running and the user name of your MySQL account.
Substitute appropriate values for your setup. The ******** represents your password; enter it when MySQL displays
the Enter password: prompt.
If that works, you should see some introductory information followed by a mysql> prompt:
shell> mysql -h host -u user -p
Enter password: ********
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 25338 to server version: 5.0.96-standard
Type 'help;' or 'h' for help. Type 'c' to clear the buffer.
mysql>
The mysql> prompt tells you that mysql is ready for you to enter commands.
If you are logging in on the same machine that MySQL is running on, you can omit the host, and simply use the
following:
shell> mysql -u user -p
If, when you attempt to log in, you get an error message such as ERROR 2002 (HY000): Can't connect to local MySQL
server through socket '/tmp/mysql.sock' (2), it means that the MySQL server daemon (Unix) or service (Windows) is not
running. Consult the administrator that is appropriate to your operating system.
Some MySQL installations permit users to connect as the anonymous (unnamed) user to the server running on the local
host. If this is the case on your machine, you should be able to connect to that server by invoking mysql without any
options:

24
shell> mysql
After you have connected successfully, you can disconnect any time by typing QUIT (or q) at the mysql> prompt:
mysql> QUIT
Bye
On Unix, you can also disconnect by pressing Control+D.
Most examples in the following sections assume that you are connected to the server. They indicate this by
the mysql> prompt.
Data type
In computer science and computer programming, a data type or simply type is a classification identifying one of various
types of data, such as real, integer or Boolean, that determines the possible values for that type; the operations that can
be done on values of that type; the meaning of the data; and the way values of that type can be stored.
In MySQL there are three main types : text, number, and Date/Time types. Refer mysql book (Aplus).
The Java programming language is statically-typed, which means that all variables must first be declared before they can
be used.
All programs involve storing and manipulating data.
Luckily (???) the computer only knows about a few types of data. These include, numbers, true/false values, characters
(a,b,c,1,2,3,etc), lists of data, and complex "Structures" of data, which build up new data types by combining the other
data types.
Creating & Using Database , getting information about database and table– refer mysql book.
Batch mode - To run your SQL batch file from the command line, enter the following:
In Windows:
mysql < c:commands.sql
Don’t forget to enc lose the file path in quotes if there are any spac es.
Running the Batch Job as a Scheduled Task
In Windows
Batch jobs can be even more automated by running them as a scheduled task. In Windows, batch files are
used to execute DOS commands. We can schedule our batch job by placing the command code that we
entered earlier in a file, suc h as “runsql.bat”. This file will contain only one line:
mysql < c:commands.sql
To schedule the batch job:
1. Open Scheduled Tasks.
 Click Start, click All Programs, point to Accessories, point to System Tools, and then
click Scheduled Tasks:

2. Double-click Add Scheduled Task to start the Scheduled Task Wizard, and then click Next in the first
dialog box.
25
3. The next dialog box displays a list of programs that are installed on your computer, either as part of
the Windows operating system, or as a result of software installation. Click Browse and select your SQL
file, and then click Open.
4. Type a name for the task, and then choose when and how often you want the task to run, from one of
the following options:
 Daily
 Weekly
 Monthly
 One time only
 When my computer starts (before a user logs on)
 When I log on (only after the current user logs on)
5. Click Next, specify the information about the day and time to run the task, and then click Next.
6. OPTIONAL: Enter the name and password of the user who is associated with this task. Make sure that
you choose a user with sufficient permissions to run the program. By default, the wizard selects the name
of the user who is currently logged on.
Scheduled Tasks in Windows
If at a later time you’d like to suspend this task, you c an open it via the Scheduled Tasks dialog (pictured
above) and deselec t the Enabled c hec kbox on the “Task” tab:

26
The “Task” Tab Containing the “Enabled” Checkbox
Similarly, you can remove the task by deleting it like any file. In fact, the task is saved as a .job f ile in the
WINNTTasks folder.
Mysql in Cloud
A database accessible to clients from the cloud and delivered to users on demand via the Internet from a cloud database
provider's servers. Also referred to as Database-as-a-Service (DBaaS), cloud databases can use cloud computing to
achieve optimized scaling, high availability, multi-tenancy and effective resource allocation.
While a cloud database can be a traditional database such as a MySQL or SQL Server database that has been adopted for
cloud use, a native cloud database such as Xeround's MySQL Cloud database tends to better equipped to optimally use
cloud resources and to guarantee scalability as well as availability and stability.
Cloud databases can offer significant advantages over their traditional counterparts, including increased accessibility,
automatic failover and fast automated recovery from failures, automated on-the-go scaling, minimal investment and
maintenance of in-house hardware, and potentially better performance. At the same time, cloud databases have their
share of potential drawbacks, including security and privacy issues as well as the potential loss of or inability to access
critical data in the event of a disaster or bankruptcy of the cloud database service provider.

Database Management Systems (Mcom Ecommerce)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Database Management Systems (Mcom Ecommerce)

Similar to Database Management Systems (Mcom Ecommerce) (20)

Recently uploaded

Recently uploaded (20)

Database Management Systems (Mcom Ecommerce)