SlideShare a Scribd company logo
1 of 26
Meaning 
A database is an organized collection of data. The data are typically organized to model aspects of reality in a way that 
supports processes requiring information. For example, modeling the availability of rooms in hotels in a way that 
supports finding a hotel with vacancies. 
Database management systems (DBMSs) are specially designed software applications that interact with the user, other 
applications, and the database itself to capture and analyze data. A general -purpose DBMS is a software system 
designed to allow the definition, creation, querying, update, and administration of databases. Well-known DBMSs 
include MySQL, PostgreSQL, Microsoft SQL Server, Oracle, SAP and IBM DB2. A database is not generally portable across 
different DBMSs, but different DBMSs can interoperate by using standards such as SQL and ODBC or JDBC to allow a 
single application to work with more than one DBMS. Database management systems are often classified according to 
the database that they support; the most popular database systems since the 1980s have all supported the relational 
model as represented by the SQL language. 
Systematically organized or structured repository of indexed information (usually as a group of linked data files) that 
allows easy retrieval, updating, analysis, and output of data. Stored usually in a computer, this data could be in 
the form of graphics, reports, scripts, tables, text, etc., representing almost every kind of information. 
Most computer applications (including software, spreadsheets, word-processors) are databases at their core. See 
also flat database and relational database. 
A database is a collection of information that is organized so that it can easily be accessed, managed, and updated. In 
one view, databases can be classified according to types of content: bibliographic, full -text, numeric, and images. A 
database is a collection of information that is organized so that it can easily be accessed, managed, and updated. In one 
view, databases can be classified according to types of content: bibliographic, full -text, numeric, and images. 
In computing, databases are sometimes classified according to their organizational approach. The most prevalen t 
approach is the relational database, a tabular database in which data is defined so that it can be reorganized and 
accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among 
different points in a network. An object-oriented programming database is one that is congruent with the data defined 
in object classes and subclasses. 
Computer databases typically contain aggregations of data records or files, such as sales transactions, product catalogs 
and inventories, and customer profiles. Typically, a database manager provides users the capabilities of controlling 
read/write access, specifying report generation, and analyzing usage. Databases and database managers are prevalent in 
large main frame systems, but are also present in smaller distributed workstation and mid-range systems such as the 
AS/400 and on personal computers. SQL(Structured Query Language) is a standard language for making interactive 
queries from and updating a database such as IBM's DB2, Microsoft's SQL Server, and database products 
from Oracle, Sybase, and Computer Associates. 
Features of a DBMS 
The prime purpose of a relational database management system is to maintain data integrity. This means all the rules 
and relationships between data are consistent at all times. But a good DBMS will have other features as well. 
These include: A command language that allows you to create, delete and alter the database (data description language 
or DDL) A way of documenting all the internal structures that makes up the database (data dictionary) A language to 
support the manipulation and processing of the data (data manipulation language) Support the ability to view the 
database from different viewpoints according to the requirements of the user Provide some level of security and access 
control to the data. The simplest RDBMS may be designed with a single user in mind e.g. the database is 'locked' until 
that person has finished with it. Such a RDBMS will only cost a few hundred pounds at most and will have only a basic 
capability. On the other hand an enterprise level DBMS can support a huge number of simultaneous users with 
thousands of internal tables and complex 'roll back' capabilities should things go wrong. 
Obviously this kind of system will cost thousands along with a need to have professional database administrators looking 
after it and database specialists to create complex queries for management and staff. 
1
1. Controlling Data Redundancy: 
In non-database systems (traditional computer file processing), each application program has its own files. In this case, 
the duplicated copies of the same data are created at many places. In DBMS, all the data of an organization is integrated 
into a single database. The data is recorded at only one place in the database and it is not duplicated. For example, the 
dean's faculty file and the faculty payroll file contain several items that are identical. When they are converted into 
database, the data is integrated into a single database so that multiple copies of the same data are reduced to-single 
copy. In DBMS, the data redundancy can be controlled or reduced but is not removed completely. Sometimes, it is 
necessary to create duplicate copies of the same data items in order to relate tables with each other. By controlling 
the data redundancy, you can save storage space. Similarly, it is useful for retrieving data from database using queries. 
2. Data Consistency: 
By controlling the data redundancy, the data consistency is obtained. If a data item appears only once, any update to its 
value has to be performed only once and the updated value (new value of item) is immediately available to all users. 
If the DBMS has reduced redundancy to a minimum level, the database system enforces consistency. It means that when 
a data item appears more than once in the database and is updated, the DBMS automatically updates each occurrence 
of a data item in the database. 
3. Data Sharing: 
In DBMS, data can be shared by authorized users of the organization. The DBA manages the data and gives rights to 
users to access the data. Many users can be authorized to access the same set of information simultaneously. The 
remote users can also share same data. Similarly, the data of same database can be shared between different 
application programs. 
4. Data Integration: 
In DBMS, data in database is stored in tables. A single database contains multiple tables and relationships can be created 
between tables (or associated data entities). This makes easy to retrieve and update data. 
5. Integrity Constraints: 
Integrity constraints or consistency rules can be applied to database so that the correct data can be entered into 
database. The constraints may be applied to data item within a single record or they may be applied to relationships 
between records. 
Examples: 
The examples of integrity constraints are: 
(i) 'Issue Date' in a library system cannot be later than the corresponding 'Return Date' of a book. 
(ii) Maximum obtained marks in a subject cannot exceed 100. 
(iii) Registration number of BCS and MCS students must start with 'BCS' and 'MCS' respectively etc. 
There are also some standard constraints that are intrinsic in most of the DBMSs. These are; 
2 
Constraint Name Description 
PRIMARY KEY 
Designates a column or combination of columns as Primary Key and 
therefore, values of columns cannot be repeated or left blank. 
FOREIGN KEY 
Relates one table with another table. 
UNIQUE 
Specifies that values of a column or combination of columns cannot be 
repeated. 
NOT NULL Specifies that a column cannot contain empty values. 
CHECK Specifies a condition which each row of a table must satisfy. 
Most of the DBMSs provide the facility for applying the integrity constraints. The database designer (or DBA) identifies 
integrity constraints during database design. The application programmer can also identify integrity constraints in the 
program code during developing the application program. The integrity constraints are automatically checked at the 
time of data entry or when the record is updated. If the data entry operator (end-user) violates an integrity constraint,
the data is not inserted or updated into the database and a message is displayed by the system. For example, when you 
draw amount from the bank through ATM card, then your account balance is compared with the amount you are 
drawing. If the amount in your account balance is less than the amount you want to draw, then a message is displayed 
on the screen to inform you about your account balance. 
6. Data Security: 
Data security is the protection of the database from unauthorized users. Only the authorized persons are allowed to 
access the database. Some of the users may be allowed to access only a part of database i.e., the data that is related to 
them or related to their department. Mostly, the DBA or head of a department can access all the data in the database. 
Some users may be permitted only to retrieve data, whereas others are allowed to retrieve as well as to update data. 
The database access is controlled by the DBA. He creates the accounts of users and gives rights to access the database. 
Typically, users or group of users are given usernames protected by passwords. 
Most of the DBMSs provide the security sub-system, which the DBA uses to create accounts of users and to specify 
account restrictions. The user enters his/her account number (or username) and password to access the data from 
database. For example, if you have an account of e-mail in the "hotmail.com" (a popular website), then you have to give 
your correct username and password to access your account of e-mail. Similarly, when you insert your ATM card into the 
Auto Teller Machine (ATM) in a bank, the machine reads your ID number printed on the card and then asks you to enter 
your pin code (or password). In this way, you can access your account. 
7. Data Atomicity: 
A transaction in commercial databases is referred to as atomic unit of work. For example, when you purchase something 
from a point of sale (POS) terminal, a number of tasks are performed such as; 
Company stock is updated. 
Amount is added in company's account. 
Sales person's commission increases etc. 
All these tasks collectively are called an atomic unit of work or transaction. These tasks must be completed in all; 
otherwise partially completed tasks are rolled back. Thus through DBMS, it is ensured that only consistent data exists 
within the database. 
8. Database Access Language: 
Most of the DBMSs provide SQL as standard database access language. It is used to access data from multiple tables of a 
database. 
9. Development of Application: 
The cost and time for developing new applications is also reduced. The DBMS provides tools that can be used to develop 
application programs. For example, some wizards are available to generate Forms and Reports. Stored procedures 
(stored on server side) also reduce the size of application programs. 
10. Creating Forms: 
Form is very important object of DBMS. You can create Forms very easily and quickly in DBMS, Once a Form is created, it 
can be used many times and it can be modified very easily. The created Forms are also saved along with database and 
behave like a software component. A Form provides very easy way (user-friendly interface) to enter data into database, 
edit data, and display data from database. The non-technical users can also perform various operations on databases 
through Forms without going into the technical details of a database. 
11. Report Writers: 
Most of the DBMSs provide the report writer tools used to create reports. The users can create reports very easily and 
quickly. Once a report is created, it can be used many times and it can be modified very easily. The created re ports are 
also saved along with database and behave like a software component. 
12. Control Over Concurrency: 
In a computer file-based system, if two users are allowed to access data simultaneously, it is possible that they will 
interfere with each other. For example, if both users attempt to perform update operation on the same record, then one 
may overwrite the values recorded by the other. Most DBMSs have sub-systems to control the concurrency so that 
transactions are always recorded" with accuracy. 
3
13. Backup and Recovery Procedures: 
In a computer file-based system, the user creates the backup of data regularly to protect the valuable data from 
damaging due to failures to the computer system or application program. It is a time consuming method, if volume of 
data is large. Most of the DBMSs provide the 'backup and recovery' sub-systems that automatically create the backup of 
data and restore data if required. For example, if the computer system fails in the middle (or end) of an update 
operation of the program, the recovery sub-system is responsible for making sure that the database is restored to the 
state it was in before the program started executing. 
14. Data Independence: 
The separation of data structure of database from the application program that is used to access data from database is 
called data independence. In DBMS, database and application programs are separated from each other. The DBMS sits 
in between them. You can easily change the structure of database without modifying the application program. For 
example you can modify the size or data type of a data items (fields of a database table). On the other hand, in 
computer file-based system, the structure of data items are built into the individual application programs. Thus the data 
is dependent on the data file and vice versa. 
15. Advanced Capabilities: 
DBMS also provides advance capabilities for online access and reporting of data through Internet. Today, most of the 
database systems are online. The database technology is used in conjunction with Internet technology to access data on 
the web servers. 
Data Base Management Systems Architecture 
Data Base Management Systems (DBMS) are very relevant in today’s world where information matters. Most business 
operations of large companies are dependent on their databases in some way or the other. Many companies use their 
data analysis methods to leverage the data in their databases and provide better service to customers and compete 
with their business rivals. Databases are collections of data that has been organized in a certain way. The term DBMS 
is a commonly used to refer to computer program that can help you store, change and retrieve the data in your 
database. Most DBMS software products use SQL as the main query language – the language that lets you interact 
with and extract results from your database quickly. SQL is the language used to query popular database systems like 
Oracle, SQL Server and MySQL. Learning SQL and DBMS can help you become a database administrator. 
DBMS Architecture 
DBMS architecture is the way in which the data in a database is viewed (or represented to) by users. It helps you 
represent your data in an understandable way to the users, by hiding the complex bits that deal with the working of 
the system. Remember, DBMS architecture is not about how the DBMS software operates or how it processes data. 
We’re going to take a look at the ANSI-SPARC DBMS standard model. ANSI is the acronym for American National 
Standards Institute. It sets standards for American goods so that they can be used anywhere in the world without 
compatibility problems. In the case of DBMS software, ANSI has standardized SQL, so that most DBMS products use 
SQL as the main query language. The ANSI has also standardized a three level DBMS architecture model followed by 
most database systems, and it’s known as the abstract ANSI-SPARC design standard. 
The ANSI-SPARC Database Architecture is set up into three tiers. Let’s take a closer look at them. 
The Internal Level (Physical Representation of Data) : The internal level is the lowest level in a three tiered database. 
This level deals with how the stored data on the database is represented to the user. This level shows exactly how the 
data is stored and organized for access on your system. This is the most technical of the three levels. However, the 
internal level view is still abstract –even if it shows how the data is stored physically, it will not show how the 
database software operates on it. So how exactly is data stored on this level? There are several considerations to be 
made when storing data. Some of them include figuring out the right space allocation techniques, data compression 
techniques (if necessary), security and encryption and the access paths the software can take to retrieve the data. 
Most DBMS software products make sure that data access is optimized and that data uses minimum storage space. 
The OS you’re running is actually in charge of managing the physical storage space. 
4
The Conceptual Level (Holistic Representation of Data) : The conceptual level tells you how the database was 
structured logically. This level tells you about the relationship between the data members of your database, exactly 
what data is stored in it and what a user will need to use the database. This level does not concern itself with how this 
logical structure will actually be implemented. It’s actually an overview of your database. The conceptual level acts as 
a sort of a buffer between the internal level and the external level. It helps hide the complexity of the database and 
hides how the data is physically stored in it. The database administrator will have to be conversant with this layer, 
because most of his operations are carried out on it. Only a database administrator is allowed to modify or structure 
this level. It provides a global view of the database, as well as the hardware and software necessary for running it – all 
important info for a database admin. 
The External Level (User Representation of Data) : This is the uppermost level in the database. It implements the 
concept of abstraction as much as possible. This level is also known as the view level because it deals with how a user 
views your database. The external level is what allows a user to access a customized version of the data in your 
database. Multiple users can work on a database on the same time because of it. The external level also hides the 
working of the database from your users. It maintains the security of the database by giving users access only to the 
data which they need at a particular time. Any data that is not needed will not be displayed. Three “schemas” 
(internal, conceptual and external) show how the database is internally and externally structured, and so this type of 
database architecture is also known as the “three -schema” architecture. 
Functional dependency 
A functional dependency occurs when one attribute in a relation uniquely determines another attribute. This can be 
written A -> B which would be the same as stating "B is functionally dependent upon A." 
Examples: In a table listing employee characteristic including Social Security Number (SSN) and name, it can be said that 
name is functionally dependent upon SSN (or SSN -> name) because an employee's name can be uniquely determined 
from their SSN. However, the reverse statement (name -> SSN) is not true because more than one employee can have 
the same name but different SSNs. 
Definition - What does Functional Dependency mean? 
Functional dependency is a relationship that exists when one attribute uniquely determines another att ribute. If R is a 
relation with attributes X and Y, a functional dependency between the attributes is represented as X->Y, which specifies 
Y is functionally dependent on X. Here X is a determinant set and Y is a dependent attribute. Each value of X is associated 
precisely with one Y value. Functional dependency in a database serves as a constraint between two sets of attributes. 
Defining functional dependency is an important part of relational database design and contributes to aspect 
normalization. 
What is Normalization? 
Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization 
process: eliminating redundant (unwanted) data (for example, storing the same data in more than one table) and 
ensuring dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce 
the amount of space a database consumes and ensure that data is logically stored. 
Techopedia - Normalization is the process of reorganizing data in a database so that it meets two basic requirements: (1) 
There is no redundancy of data (all data is stored in only one place), and (2) data dependencies are logical (all related 
data items are stored together). Normalization is important for many reasons, but chiefly because it allows databases to 
take up as little disk space as possible, resulting in increased performance. Normalization is also known as data 
normalization. 
The Normal Forms 
The database community has developed a series of guidelines for ensuring that databases are normalized. These are 
referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal 
form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll of ten see 1NF , 2NF , 
5
and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen and won't be discussed in this article. 
Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guideline s 
only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when 
variations take place, it's extremely important to evaluate any possible ramifications they could have on your system 
and account for possible inconsistencies. That said, let's explore the normal forms. 
First Normal Form (1NF) 
First normal form (1NF) sets the very basic rules for an organized database: Eliminate duplicative columns from the 
same table. Create separate tables for each group of related data and identify each row with a unique column or set of 
columns (the primary key). 
Second Normal Form (2NF) 
Second normal form (2NF) further addresses the concept of removing duplicative data: Meet all the requirements of the 
first normal form. Remove subsets of data that apply to multiple rows of a table and place them in separate tables. 
Create relationships between these new tables and their predecessors through the use of foreign keys. 
Third Normal Form (3NF) 
Third normal form (3NF) goes one large step further: Meet all the requirements of the second normal form. 
Remove columns that are not dependent upon the primary key. 
Boyce-Codd Normal Form (BCNF or 3.5NF) 
The Boyce-Codd Normal Form also referred to as the "third and half (3.5) normal form", adds one more requirement: 
Meet all the requirements of the third normal form. Every determinant must be a candidate key. 
Fourth Normal Form (4NF) 
Finally, fourth normal form (4NF) has one additional requirement: Meet all the requirements of the third normal form. 
A relation is in 4NF if it has no multi-valued dependencies. Remember, these normalization guidelines are cumulative. 
For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database. 
Data Models 
E- R Model is a graphical representation of entities and their relationships to each other, typically used in computing in 
regard to the organization of data within databases or information systems. An entity is a piece of data-an object or 
concept about which data is stored. 
A relationship is how the data is shared between entities. There are three types of relationships between entities: 
1. One-to-One 
One instance of an entity (A) is associated with one other instance of another entity (B). For example, in a database of 
employees, each employee name (A) is associated with only one social security number (B). 
2. One-to-Many 
One instance of an entity (A) is associated with zero, one or many instances of another entity (B), but for one instance of 
entity B there is only one instance of entity A. For example, for a company with all employees working in one building, 
the building name (A) is associated with many different employees (B), but those employees all share the same singular 
association with entity A. 
3. Many-to-Many 
One instance of an entity (A) is associated with one, zero or many instances of another entity (B), and one instance of 
entity B is associated with one, zero or many instances of entity A. For example, for a company in which all of its 
6
employees work on multiple projects, each instance of an employee (A) is associated with many instances of a project 
(B), and at the same time, each instance of a project (B) has multiple employees (A) associated with it. 
Relational Model 
The relational model for database management is a database model based on first-order predicate logic, first formulated 
and proposed in 1969 by Edgar F. Codd. In the relational model of a database, all data is represented in terms of tuples, 
grouped into relations. A database organized in terms of the relational model is a relational database. 
The purpose of the relational model is to provide a declarative method for specifying data and queries: users directly 
state what information the database contains and what information they want from it, and let the database 
management system software take care of describing data structures for storing the data and retrieval procedures for 
answering queries. 
Most relational databases use the SQL data definition and query language; these systems implement what can be 
regarded as an engineering approximation to the relational model. A table in an SQL database schema corresponds to a 
predicate variable; the contents of a table to a relation; key constraints, other constraints, and SQL queries correspond 
to predicates. However, SQL databases deviate from the relational model in many details, and Codd fiercely argued 
against deviations that compromise the original principles. 
7 
Diagram of an example database according to the Relational model.
8 
In the relational model, related records are linked together with a "key". 
Network model 
The Network model replaces the hierarchical tree with a graph thus allowing more general connections among the 
nodes. The main difference of the network model from the hierarchical model, is its ability to handle many to many (N: 
N) relations. In other words, it allows a record to have more than one parent. Suppose an employee works for two 
departments. The strict hierarchical arrangement is not possible here and the tree becomes a more generalized graph - 
a network. The network model was evolved to specifically handle non-hierarchical relationships. As shown below data 
can belong to more than one parent. Note that there are lateral connections as well as top-down connections. A 
network structure thus allows 1:1 (one: one), l: M (one: many), M: M (many: many) relationships among entities. In 
network database terminology, a relationship is a set. Each set is made up of at least two types of records: an owner 
record (equivalent to parent in the hierarchical model) and a member record (similar to the child record in the 
hierarchical model). The database of Customer-Loan, which we discussed earlier for hierarchical model, is now 
represented for Network model as shown. It can easily depict that now the information about the joint loan L1 appears 
single time, but in case of hierarchical model it appears for two times. Thus, it reduces the redundancy and is better as 
compared to hierarchical model. 
Hierarchical Model 
The Hierarchical Data Model is a way of organizing a database with multiple one to many relationships. The structure is 
based on the rule that one parent can have many children but children are allowed only one parent. This structure 
allows information to be repeated through the parent child relations created by IBM and was implemented mainly in 
their Information Management System. (IMF), the precursor to the DBMS. 
A hierarchical database model is a data model in which the data is organized into a tree-like structure. The data is stored 
as records which are connected to one another through links. A record is a collection of fields, with each field containing 
only one value. The entity type of a record defines which fields the record contains. 
A record in the hierarchical database model corresponds to a row (or tuple) in the relational database model and an 
entity type corresponds to a table (or relation). The hierarchical database model mandates that each child record has 
only one parent, whereas each parent record can have one or more child records. In order to retrieve data from a
hierarchical database the whole tree needs to be traversed starting from the root node. This model is recognized as the 
first database model created by IBM in the 1960 
Distributed database 
A distributed database is a database that is under the control of a central database management system (DBMS) in 
which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the 
same physical location, or may be dispersed over a network of interconnected computers. Collections of data (e.g. in a 
database) can be distributed across multiple physical locations. A distributed database can reside on network servers on 
the Internet, on corporate intranets or extranets, or on other company networks. The replication and distribution of 
databases improves database performance at end-user worksites. 
To ensure that the distributive databases are up to date and current, there are two processes: replication and 
duplication. Replication involves using specialized software that looks for changes in the distributive database. Once the 
changes have been identified, the replication process makes all the databases look the same. The replication process can 
be very complex and time consuming depending on the size and number of the distributive databases. This process can 
also require a lot of time and computer resources. Duplication on the other hand is not as complicated. It basically 
identifies one database as a master and then duplicates that database. The duplication process is normally done at a set 
time after hours. This is to ensure that each distributed location has the same data. In the duplication process, changes 
to the master database only are allowed. This is to ensure that local data will not be overwritten. Both of the processes 
can keep the data current in all distributive locations. 
Besides distributed database replication and fragmentation, there are many other distributed database design 
technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These 
technologies' implementation can and does depend on the needs of the business and the sensitivity/confidentiality of 
the data to be stored in the database, and hence the price the business is wil ling to spend on ensuring data security, 
consistency and integrity. 
Object oriented database 
An object database (also object-oriented database management system) is a database management system in which 
information is represented in the form of objects as used in object-oriented programming. Object databases are 
different from relational databases which are table-oriented. Object-relational databases are a hybrid of both 
approaches. Object databases have been considered since the early 1980s. 
Object-oriented database management systems (OODBMSs) combine database capabilities with object-oriented 
programming language capabilities. OODBMSs allow object-oriented programmers to develop the product, store them 
as objects, and replicate or modify existing objects to make new objects within the OODBMS. Because the database is 
9
integrated with the programming language, the programmer can maintain consistency within one environment, in that 
both the OODBMS and the programming language will use the same model of representation. Relational DBMS projects, 
by way of contrast, maintain a clearer division between the database model and the application. 
As the usage of web-based technology increases with the implementation of Intranets and extranets, companies have a 
vested interest in OODBMSs to display their complex data. Using a DBMS that has been specifically designed to store 
data as objects gives an advantage to those companies that are geared towards multimedia presentation or 
organizations that utilize computer-aided design (CAD). Some object-oriented databases are designed to work well 
with object-oriented programming languages such as Delphi, Ruby, Python, Perl, Java, C#, Visual Basic 
.NET, C++,Objective-C and Smalltalk; others have their own programming languages. OODBMSs use exactly the same 
model as object-oriented programming languages. 
Spatial database 
A spatial database is a database that is optimized to store and query data that represents objects defined in a geometric 
space. Most spatial databases allow representing simple geometric objects such as points, li nes and polygons. Some 
spatial databases handle more complex structures such as 3D objects, topological coverage’s, linear networks, and TINs. 
While typical databases are designed to manage various numeric and character types of data, additional functionality 
needs to be added for databases to process spatial data types efficiently. These are typically called geometry or feature. 
The Open Geospatial Consortium created the Simple Features specification and sets standards for adding spatial 
functionality to database systems. 
Multimedia database 
A Multimedia database (MMDB) is a collection of related multimedia data. The multimedia data include one or more 
primary media data types such 
as text, images, graphicobjects (including drawings, sketches and illustrations) animation sequences, audio and video. 
A Multimedia Database Management System (MMDBMS) is a framework that manages different types of data 
potentially represented in a wide diversity of formats on a wide array of media sources. It provides support for 
multimedia data types, and facilitate for creation, storage, access, query and control of a multimedia database. 
Crash Recovery System 
Though we are living in highly technologically advanced era where hundreds of satellite monitor the earth and at every 
second billions of people are connected through information technology, failure is expected but not every time 
acceptable. 
DBMS is highly complex system with hundreds of transactions being executed every second. Availability of DBMS 
depends on its complex architecture and underlying hardware or system software. If it fails or crashes amid transactions 
being executed, it is expected that the system would follow some sort of algorithm or techniques to recover from 
crashes or failures. 
Failure Classification 
To see where the problem has occurred we generalize the failure into various categories, as follows: 
TRANSACTION FAILURE 
When a transaction is failed to execute or it reaches a point after which it cannot be completed successfully it has to 
abort. This is called transaction failure. Where only few transaction or process are hurt. 
Reason for transaction failure could be: 
Logical errors: where a transaction cannot complete because of it has some code error or any internal error condition 
System errors: where the database system itself terminates an active transaction because DBMS is not able to execute it 
or it has to stop because of some system condition. For example, in case of deadlock or resource unavailability systems 
aborts an active transaction. 
10
SYSTEM CRASH 
There are problems, which are external to the system, which may cause the system to stop abruptly and cause the 
system to crash. For example interruption in power supply, failure of underlying hardware or software failure. 
Examples may include operating system errors. 
DISK FAILURE: 
In early days of technology evolution, it was a common problem where hard disk drives or storage drives used to fail 
frequently. Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other failu re, 
which destroys all or part of disk storage 
Storage Structure 
We have already described storage system here. In brief, the storage structure can be divided in various categories: 
Volatile storage: As name suggests, this storage does not survive system crashes and mostly placed very closed to CPU 
by embedding them onto the chipset itself for examples: main memory, cache memory. They are fast but can store a 
small amount of information. 
Nonvolatile storage: These memories are made to survive system crashes. They are huge in data storage capacity but 
slower in accessibility. Examples may include, hard disks, magnetic tapes, flash memory, non-volatile (battery backed up) 
RAM. 
Recovery and Atomicity 
When a system crashes, it many have several transactions being executed and various files opened for them to 
modifying data items. As we know that transactions are made of various operations, which are atomic in nature. But 
according to ACID properties of DBMS, atomicity of transactions as a whole must be maintained that is, either all 
operations are executed or none. 
When DBMS recovers from a crash it should maintain the following: 
It should check the states of all transactions, which were being executed. 
A transaction may be in the middle of some operation; DBMS must ensure the atomicity of transaction in this case. 
It should check whether the transaction can be completed now or needs to be rolled back. 
No transactions would be allowed to left DBMS in inconsistent state. 
There are two types of techniques, which can help DBMS in recovering as well as maintaining the atomicity of 
transaction: Maintaining the logs of each transaction, and writing them onto some stable storage before actually 
modifying the database. Maintaining shadow paging, where the changes are done on a volatile memory and later the 
actual database is updated. 
Log-Based Recovery 
Log is a sequence of records, which maintains the records of actions performed by a transaction. It is important that the 
logs are written prior to actual modification and stored on a stable storage media, which is failsafe. 
Log based recovery works as follows: 
The log file is kept on stable storage media 
When a transaction enters the system and starts execution, it writes a log about it 
<Tn, Start> 
When the transaction modifies an item X, it write logs as follows: 
<Tn, X, V1, V2> 
It reads Tn has changed the value of X, from V1 to V2. 
When transaction finishes, it logs: 
<Tn, commit> 
Database can be modified using two approaches: 
Deferred database modification: All logs are written on to the stable storage and database is updated when transaction 
commits. 
11
Immediate database modification: Each log follows an actual database modification. That is, database is modified 
immediately after every operation. 
Recovery with concurrent transactions 
When more than one transaction is being executed in parallel, the logs are interleaved. At the time of recovery it would 
become hard for recovery system to backtrack all logs, and then start recovering. To ease this situation most modern 
DBMS use the concept of 'checkpoints'. 
CHECKPOINT 
Keeping and maintaining logs in real time and in real environment may fill out all the memory space available in the 
system. At time passes log file may be too big to be handled at all. Checkpoint is a mechanism where all the previous 
logs are removed from the system and stored permanently in storage disk. Checkpoint declares a point before which the 
DBMS was in consistent state and all the transactions were committed. 
RECOVERY 
When system with concurrent transaction crashes and recovers, it does behave in the following manner: 
[Image: Recovery with concurrent transactions] 
The recovery system reads the logs backwards from the end to the last Checkpoint. 
It maintains two lists, undo-list and redo-list. 
If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it puts the transaction in redo-list. 
If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the transaction in undo-list. 
All transactions in undo-list are then undone and their logs are removed. All transaction in redo-list, their previous logs 
are removed and then redone again and log saved 
Database security / Authorization concerns the use of a broad range of information security controls to 
protect databases (potentially including the data, the database applications or stored functions, the database systems, 
the database servers and the associated network links) against compromises of their confidentiality, integrity and 
availability. It involves various types or categories of controls, such as technical, procedural/administrative and 
physical. Database security is a specialist topic within the broader realms of computer security, information 
security and risk management. Security risks to database systems include, for example: 
Unauthorized or unintended activity or misuse by authorized database users, database administrators, or 
network/systems managers, or by unauthorized users or hackers (e.g. inappropriate access to sensitive data, metadata 
or functions within databases, or inappropriate changes to the database programs, structures or security 
configurations); 
Malware infections causing incidents such as unauthorized access, leakage or disclosure of personal or proprietary data, 
deletion of or damage to the data or programs, interruption or denial of authorized access to the database, attacks on 
other systems and the unanticipated failure of database services; 
12
Overloads, performance constraints and capacity issues resulting in the inability of authorized users to use databases as 
intended; 
Physical damage to database servers caused by computer room fires or floods, overheating, lightning, accidental liquid 
spills, static discharge, electronic breakdowns/equipment failures and obsolescence; 
Design flaws and programming bugs in databases and the associated programs and systems, creating various security 
vulnerabilities (e.g. unauthorized privilege escalation), data loss/corruption, performance degradation etc.; 
Data corruption and/or loss caused by the entry of invalid data or commands, mistakes in database or system 
administration processes, sabotage/criminal damage etc. 
Many layers and types of information security control are appropriate to databases, including: 
Access control 
Auditing 
Authentication 
Encryption 
Integrity controls 
Backups 
Application security 
Database Security applying Statistical Method 
Traditionally databases have been largely secured against hackers through network security measures such as firewalls, 
and network-based intrusion detection systems. While network security controls remain valuable in this regard, securing 
the database systems themselves, and the programs/functions and data within them, has arguably become more critical 
as networks are increasingly opened to wider access, in particular access from the Internet. Furthermore, system, 
program, function and data access controls, along with the associated user identification, authentication and rights 
management functions, have always been important to limit and in some cases log the activities of authorized users and 
administrators. In other words, these are complementary approaches to database security, working from both the 
outside-in and the inside-out as it were. 
Many organizations develop their own "baseline" security standards and designs detailing basic security control 
measures for their database systems. These may reflect general information security requirements or obligations 
imposed by corporate information security policies and applicable laws and regulations (e.g. concerning privacy, 
financial management and reporting systems), along with generally accepted good database security practices (such as 
appropriate hardening of the underlying systems) and perhaps security recommendations from the relevant database 
system and software vendors. The security designs for specific database systems typically specify further security 
administration and management functions (such as administration and reporting of user access rights, log management 
and analysis, database replication/synchronization and backups) along with various business-driven information security 
controls within the database programs and functions (e.g. data entry validation and audit trails). Furthermore, various 
security-related activities (manual controls) are normally incorporated into the procedures, guidelines etc. relating to 
the design, development, configuration, use, management and maintenance of databases. 
Data Warehouse Architecture 
Different data warehousing systems have different structures. Some may have an ODS (operational data store), while 
some may have multiple data marts. Some may have a small number of data sources, while some may have dozens of 
data sources. In view of this, it is far more reasonable to present the different layers of a data warehouse architecture 
rather than discussing the specifics of any one system. In general, al l data warehouse systems have the following layers: 
 Data Source Layer 
 Data Extraction Layer 
 Staging Area 
 ETL Layer 
 Data Storage Layer 
13
14 
 Data Logic Layer 
 Data Presentation Layer 
 Metadata Layer 
 System Operations Layer 
Data Source Layer 
This represents the different data sources that feed data into the data warehouse. The data source can be of any format 
-- plain text file, relational database, other types of database, Excel file, etc., can all act as a data source. 
Many different types of data can be a data source: 
-- such as sales data, HR data, product data, inventory data, marketing data, systems data. 
-party data, such as census data, demographics data, or survey data. 
All these data sources together form the Data Source Layer. 
Data Extraction Layer 
Data gets pulled from the data source into the data warehouse system. There is likely some minimal data cleansing, but 
there is unlikely any major data transformation. 
Staging Area 
This is where data sits prior to being scrubbed and transformed into a data warehouse / data mart. Having one common 
area makes it easier for subsequent data processing / integration. 
ETL Layer 
This is where data gains its "intelligence", as logic is applied to transform the data from a transactional nature to an 
analytical nature. This layer is also where data cleansing happens. The ETL design phase is often the most time-consuming 
phase in a data warehousing project, and an ETL tool is often used in this layer. 
Data Storage Layer 
This is where the transformed and cleansed data sit. Based on scope and functionality, 3 types of entities can be found 
here: data warehouse, data mart, and operational data store (ODS). In any given system, you may have just one of the 
three, two of the three, or all three types. 
Data Logic Layer 
This is where business rules are stored. Business rules stored here do not affect the underlying data transformation 
rules, but do affect what the report looks like. 
Data Presentation Layer 
This refers to the information that reaches the users. This can be in a form of a tabular / graphical report in a browser, 
an emailed report that gets automatically generated and sent every day, or an alert that warns users of exceptions, 
among others. Usually an tool and/or a reporting tool is used in this layer. 
Metadata Layer 
This is where information about the data stored in the data warehouse system is stored. A logical data model would be 
an example of something that's in the metadata layer. Ametadata tool is often used to manage metadata. 
System Operations Layer 
This layer includes information on how the data warehouse system operates, such as ETL job status, system 
performance, and user access history. 
Evolution of data warehousing 
In the 1990's as organizations of scale began to need more timely data about their business, they found that traditional 
information systems technology was simply too cumbersome to provide relevant data efficiently and quickly. 
Completing reporting requests could take days or weeks using antiquated reporting tools that were designed more or
less to 'execute' the business rather than 'run' the business. 
From this idea, the data warehouse was born as a place where relevant data could be held for completing s trategic 
reports for management. The key here is the word 'strategic' as most executives were less concerned with the day to 
day operations than they were with a more overall look at the model and business functions. 
As with all technology, over the course of the latter half of the 20th century, we saw increased numbers and types of 
databases. Many large businesses found themselves with data scattered across multiple platforms and variations of 
technology, making it almost impossible for any one individual to use data from multiple sources. A key idea within data 
warehousing is to take data from multiple platforms/technologies (As varied as spreadsheets, DB2 databases, IDMS 
records, and VSAM files) and place them in a common location that uses a common querying tool. In this way 
operational databases could be held on whatever system was most efficient for the operational business, while the 
reporting / strategic information could be held in a common location using a common language. Data Warehouses take 
this even a step farther by giving the data itself commonality by defining what each term means and keeping it standard. 
(An example of this would be gender which can be referred to in many ways, but should be standardized on a data 
warehouse with one common way of referring to each sex). 
All of this was designed to make decision support more readily available and without affecting day to day operations. 
One aspect of a data warehouse that should be stressed is that it is NOT a location for ALL of a businesses data, but 
rather a location for data that is 'interesting'. Data that is interesting will assist decision makers in making strategic 
decisions relative to the organization's overall mission. 
Benefits of Data Warehousing 
The successful implementation of a data warehouse can bring major, benefits to an organization including: 
• Potential high returns on investment - Implementation of data warehousing by an organization requires a huge 
investment typically from Rs 10 lac to 50 lacs. However, a study by the International Data Corporation (IDC) in 1996 
reported that average three-year returns on investment (RO I) in data warehousing reached 401%. 
• Competitive advantage - The huge returns on investment for those companies that have successfully implemented a 
data warehouse is evidence of the enormous competitive advantage that accompanies this technology. The competitive 
advantage is gained by allowing decision-makers access to data that can reveal previously unavailable, unknown, and 
untapped information on, for example, customers, trends, and demands. 
• Increased productivity of corporate decision-makers - Data warehousing improves the productivity of corporate 
decision-makers by creating an integrated database of consistent, subject-oriented, historical data. It integrates data 
from multiple incompatible systems into a form that provides one consistent view of the organization. By transforming 
data into meaningful information, a data warehouse allows business managers to perform more substantive, accurate, 
and consistent analysis. 
• More cost-effective decision-making - Data warehousing helps to reduce the overall cost of the· product· by reducing 
the number of channels. 
• Better enterprise intelligence - It helps to provide better enterprise intelligence. 
• Enhanced customer service. 
• It is used to enhance customer" service. 
Problems of Data Warehousing 
The problems associated with developing and managing a data warehousing are as follows: 
Underestimation of resources of data loading - Sometimes we underestimate the time required to extract, clean, and 
load the data into the warehouse. It may take the significant proportion of the total development time, although some 
tools are there which are used to reduce the time and effort spent on this process. 
Hidden problems with source systems - Sometimes hidden .problems associated with the source systems feeding the 
data warehouse may be identified after years of being undetected. For example, when entering the details of a new 
15
property, certain fields may allow nulls which may result in staff entering incomplete property data, even when available 
and applicable. 
Required data not captured - In some cases the required data is not captured by the source systems which may be very 
important for the data warehouse purpose. For example the date of registration for the property may be not used in 
source system but it may be very important analysis purpose. 
Increased end-user demands - After satisfying some of end-users queries, requests for support from staff may increase 
rather than decrease. This is caused by an increasing awareness of the users on the capabilities and value of the data 
warehouse. Another reason for increasing demands is that once a data warehouse is online, it is often the case that the 
number of users and queries increase together with requests for answers to more and more complex queries. 
Data homogenization - The concept of data warehouse deals with similarity of data formats between different data 
sources. Thus, results in to lose of some important value of the data. 
High demand for resources - The data warehouse requires large amounts of data. 
Data ownership - Data warehousing may change the attitude of end-users to the ownership of data. Sensitive data that 
owned by one department has to be loaded in data warehouse for decision making purpose. But some time it results in 
to reluctance of that department because it may hesitate to share it with others. 
High maintenance - Data warehouses are high maintenance systems. Any reorganization· of the business processes and 
the source systems may affect the data warehouse and it results high maintenance cost. 
Long-duration projects - The building of a warehouse can take up to three years, which is why some organizations are 
reluctant in investigating in to data warehouse. Some only the historical data of a particular department is captured in 
the data warehouse resulting data marts. Data marts support only the requirements of a particular department and 
limited the functionality to that department or area only. 
Complexity of integration - The most important area for the management of a data warehouse is the integration 
capabilities. An organization must spend a significant amount of time determining how well the various different data 
warehousing tools can be integrated into the overall solution that is needed. This can be a very difficult task, as there are 
a number of tools for every operation of the data warehouse. 
Data mining 
A process used by companies to turn raw data into useful information. By using software to look for patterns in large 
batches of data, businesses can learn more about their customers and develop more effective marketing strategies as 
well as increase sales and decrease costs. Data mining depends on effective data collection and warehousing as well as 
computer processing. Grocery stores are well -known users of data mining techniques. Many supermarkets offer free 
loyalty cards to customers that give them access to reduced prices not available to non-members. The cards make it 
easy for stores to track who is buying what, when they are buying it, and at what price. The stores can then use this 
data, after analyzing it, for multiple purposes, such as offering customers coupons that are targeted to their buying 
habits and deciding when to put items on sale and when to sell them at full price. 
Data mining can be a cause for concern when only selected information, which is not representative of the overall 
sample group, is used to prove a certain hypothesis. 
Data mining process 
Cross-Industry Standard Process for Data Mining (CRISP-DM) consists of six phases intended as a cyclical process as the 
following figure: 
16
17 
Cross-Industry Standard Process for Data Mining (CRISP-DM) 
Business understanding 
In the business understanding phase: First, it is required to understand business objectives clearly and find out what are 
the business’s needs. 
Next, we have to assess the current situation by finding about the resources, assumptions, constraints and other 
important factors which should be considered. Then, from the business objectives and current situations, we need to 
create data mining goals to achieve the business objectives within the current situation. Finally, a good data mining plan 
has to be established to achieve both business and data mining goals. The plan should be as detailed as possible. 
Data understanding 
First, the data understanding phase starts with initial data collection, which we collect from available data sources, to 
help us get familiar with the data. Some important activities must be performed including data load and data integration 
in order to make the data collection successfully. Next, the “gross” or “surface” properties of acquired data needs to be 
examined carefully and reported. Then, the data needs to be explored by tackling the data mining questions, which can 
be addressed using querying, reporting and visualization. Finally, the data quality must be examined by answering some 
important questions such as “Is the acquired data complete?”, “Is there any missing values in the acquired data?” 
Data preparation 
The data preparation typically consumes about 90% of the time of the project. The outcome of the data preparation 
phase is the final data set. Once available data sources are identified, they need to be selected, cleaned, constructed and 
formatted into the desired form. The data exploration task at a greater depth may be carried during this phase to notice 
the patterns based on business understanding. 
Modeling 
First, modeling techniques have to be selected to be used for the prepared dataset. 
Next, the test scenario must be generated to validate the quality and validity of the model. 
Then, one or more models are created by running the modeling tool on the prepared dataset. 
Finally, models need to be assessed carefully involving stakeholders to make sure that created models are met business 
initiatives. 
Evaluation
In the evaluation phase, the model results must be evaluated in the context of business objectives in the first phase. In 
this phase, new business requirements may be raised due to the new patterns that has been discovered in the model 
results or from other factors. Gaining business understanding is an iterative process in data mining. The go or no -go 
decision must be made in this step to move to the deployment phase. 
Deployment 
The knowledge or information, which we gain through data mining process, needs to be presented in such a way that 
stakeholders can use it when they want it. Based on the business requirements, the deployment phase could be as 
simple as creating a report or as complex as a repeatable data mining process across the organization. In the 
deployment phase, the plans for deployment, maintenance and monitoring have to be created for implementation and 
also future supports. From the project point of view, the final report of the project needs to summary the project 
experiences and review the project to see what need to improved created learned lessons. 
Data mining techniques 
Association 
Association (or relation) is probably the better known and most familiar and straightforward data mining technique. 
Here, you make a simple correlation between two or more items, often of the same type to identify patterns. For 
example, when tracking people's buying habits, you might identify that a customer always buys cream when they buy 
strawberries, and therefore suggest that the next time that they buy strawberries they might also want to buy cream. 
Classification 
You can use classification to build up an idea of the type of customer, item, or object by describing multiple attributes to 
identify a particular class. For example, you can easily classify cars into different types (sedan, 4x4, convertible) by 
identifying different attributes (number of seats, car shape, driven wheels). Given a new car, you might apply it into a 
particular class by comparing the attributes with our known definition. You can apply the same principles to customers, 
for example by classifying them by age and social group. 
Clustering 
By examining one or more attributes or classes, you can group individual pieces of data together to form a structure 
opinion. At a simple level, clustering is using one or more attributes as your basis for identifying a cluster of correlating 
results. Clustering is useful to identify different information because it correlates with other examples so you can see 
where the similarities and ranges agree. Clustering can work both ways. You can assume that there is a cluster at certain 
point and then use our identification criteria to see if you are correct. In this example, a sample of sales data compares 
the age of the customer to the size of the sale. It is not unreasonable to expect that people in their twenties (before 
marriage and kids), fifties, and sixties (when the children have left home), have more disposable income. 
Prediction 
Prediction is a wide topic and runs from predicting the failure of components or machinery, to identifying fraud and 
even the prediction of company profits. Used in combination with the other data mining techniques, prediction involves 
analyzing trends, classification, pattern matching, and relation. By analyzing past events or instances, you can make a 
prediction about an event. Using the credit card authorization, for example, you might combine decision tree analysis of 
individual past transactions with classification and historical pattern matches to identify whether a transaction is 
fraudulent. Making a match between the purchase of flights to the US and transactions in the US, it is likely that the 
transaction is valid. 
Sequential patterns 
Often used over longer-term data, sequential patterns are a useful method for identifying trends, or regular occurrences 
of similar events. For example, with customer data you can identify that customers buy a particular collection of 
products together at different times of the year. In a shopping basket application, you can use this information to 
automatically suggest that certain items be added to a basket based on their frequency and past purchasing history. 
Decision trees 
18
Related to most of the other techniques (primarily classification and prediction), the decision tree can be used either as 
a part of the selection criteria, or to support the use and selection of specific data within the overall structure. Within 
the decision tree, you start with a simple question that has two (or sometimes more) answers. Each answer leads to a 
further question to help classify or identify the data so that it can be categorized, or so that a prediction can be made 
based on each answer. 
Decision tree 
Combinations 
In practice, it's very rare that you would use one of these exclusively. Classification and clustering are similar techniques. 
By using clustering to identify nearest neighbors, you can further refine your classifications. Often, we use decision trees 
to help build and identify classifications that we can track for a longer period to identify sequences and patterns. 
Long-term (memory) processing 
Within all of the core methods, there is often reason to record and learn from the information. In some techniques, it is 
entirely obvious. For example, with sequential patterns and predictive learning you look back at data from multiple 
sources and instances of information to build a pattern. In others, the process might be more explicit. Decision trees are 
rarely built one time and are never forgotten. As new information, events, and data points are identified, it might be 
necessary to build more branches, or even entirely new trees, to cope with the additional information. You can 
automate some of this process. For example, building a predictive model for identifying credit card fraud is about 
building probabilities that you can use for the current transaction, and then updating that model with the new 
(approved) transaction. This information is then recorded so that the decision can be made quickly the next time. 
Ecommerce & Web application security Issues 
1. Introduction 
E-commerce is defined as the buying and selling of products or services over electronic systems such as the Internet and 
to a lesser extent, other computer networks. It is generally regarded as the sales and commercial function of eBusiness. 
There has been a massive increase in the level of trade conducted electronically since the widespread penetration of the 
Internet. A wide variety of commerce is conducted via eCommerce, including electronic funds transfer, supply chain 
management, Internet marketing, online transaction processing, electronic data interchange (EDI), inventory 
management systems, and automated data collection systems. US online retail sales reached $175 billion in 2007 and 
are projected to grow to $335 billion by 2012 (Mulpuru, 2008). 
This massive increase in the uptake of eCommerce has led to a new generation of associated security threats, but any 
eCommerce system must meet four integral requirements: 
a) privacy – information exchanged must be kept from unauthorized parties 
19
b) integrity – the exchanged information must not be altered or tampered with 
c) authentication – both sender and recipient must prove their identities to each other and 
d) non-repudiation – proof is required that the exchanged information was indeed received (Holcombe, 2007). 
These basic maxims of eCommerce are fundamental to the conduct of secure business onli ne. Further to the 
fundamental maxims of eCommerce above, eCommerce providers must also protect against a number of different 
external security threats, most notably Denial of Service (DoS). These are where an attempt is made to make a computer 
resource unavailable to its intended users through a variety of mechanisms discussed below. The financial services 
sector still bears the brunt of e-crime, accounting for 72% of all attacks. But the sector that experienced the greatest 
increase in the number of attacks was eCommerce. Attacks in this sector have risen by 15% from 2006 to 2007 
(Symantec, 2007). 
2. Privacy 
Privacy has become a major concern for consumers with the rise of identity theft and impersonation, and any concern 
for consumers must be treated as a major concern for eCommerce providers. According to Consumer Reports Money 
Adviser (Perrotta, 2008), the US Attorney General has announced multiple indictments relating to a massive 
international security breach involving nine major retailers and more than 40 million credit- and debit-card numbers. US 
attorneys think that this may be the largest hacking and identity-theft case ever prosecuted by the justice department. 
Both EU and US legislation at both the federal and state levels mandates certain organizations to inform customers 
about information uses and disclosures. Such disclosures are typically accomplished through privacy policies, both online 
and offline (Vail et al., 2008). 
In a study by Lauer and Deng (2008), a model is presented linking privacy policy, through trustworthiness, to online 
trust, and then to customers’ loyalty and their willingness to provide truthful information. The model was tested using a 
sample of 269 responses. The findings suggested that consumers’ trust in a company is close ly linked with the 
perception of the company’s respect for customer privacy (Lauer and Deng, 2007). Trust in turn is linked to increased 
customer loyalty that can be manifested through increased purchases, openness to trying new products, and willingness 
to participate in programs that use additional personal information. Privacy now forms an integral part of any e - 
commerce strategy and investment in privacy protection has been shown to increase consumer’s spend, trustworthiness 
and loyalty. 
The converse of this can be shown to be true when things go wrong. In March 2008, the Irish online jobs board, jobs.ie, 
was compromised by criminals and users’ personal data (in the form of CV’s) were taken (Ryan, 2008). Looking at the 
real-time responses of users to this event on the popular Irish forum, Boards.ie, we can see that privacy is of major 
concern to users and in the event of their privacy being compromised users become very agitated and there is an overall 
negative effect on trust in e-commerce. User comments in the forum included: “I’m well p*ssed off about them keeping 
my CV on the sly”; “I am just angry that this could have happened and to so many people”; “Mine was taken too. How 
do I terminate my acc with jobs.ie”; “Grr, so annoyed, feel I should report i t to the Gardai now” (Boards.ie, 2008). 
3. Integrity, Authentication & Non-Repudiation 
In any e-commence system the factors of data integrity, customer & client authentication and non-repudiation are 
critical to the success of any online business. Data integrity is the assurance that data transmitted is consistent and 
correct, that is, it has not been tampered or altered in any way during transmission. Authentication is a means by which 
both parties in an online transaction can be confident that they are who they say they are and non-repudiation is the 
idea that no party can dispute that an actual event online took place. Proof of data integrity is typically the easiest of 
these factors to successfully accomplish. A data hash or checksum, such as MD5 or CRC, is usually sufficient to establish 
that the likelihood of data being undetectably changed is extremely low (Schlaeger and Pernul, 2005). Notwithstanding 
these security measures, it is still possible to compromise data in transit through techniques such as phishing or man-in-the- 
middle attacks (Desmedt, 2005). These flaws have led to the need for the development of strong verification and 
security measurements such as digital signatures and public key infrastructures (PKI). 
One of the key developments in e-commerce security and one which has led to the widespread growth of e-commerce is 
the introduction of digital signatures as a means of verification of data integrity and authentication. In 1995, Utah 
became the first jurisdiction in the world to enact an electronic signature law. An electronic signature may be defined as 
20
“any letters, characters, or symbols manifested by electronic or similar means and executed or adopted by a party with 
the intent to authenticate a writing” (Blythe, 2006). In order for a digital signature to attain the same legal status as an 
ink-on-paper signature, asymmetric key cryptology must have been employed in its production (Blythe, 2006). Such a 
system employs double keys; one key is used to encrypt the message by the sender, and a different, albeit 
mathematically related, key is used by the recipient to decrypt the message (Antoniou et al., 2008). This is a very good 
system for electronic transactions, since two stranger-parties, perhaps living far apart, can confirm each other’s identity 
and thereby reduce the likelihood of fraud in the transaction. Non-repudiation techniques prevent the sender of a 
message from subsequently denying that they sent the message. Digital Signatures using public-key cryptography and 
hash functions are the generally accepted means of providing non-repudiation of communications 
4. Technical Attacks 
Technical attacks are one of the most challenging types of security compromise an e -commerce provider must face. 
Perpetrators of technical attacks, and in particular Denial-of-Service attacks, typically target sites or services hosted on 
high-profile web servers such as banks, credit card payment gateways, large online retailers and popular social 
networking sites. 
Denial of Service Attacks 
Denial of Service (DoS) attacks consist of overwhelming a server, a network or a website in order to paralyze its normal 
activity (Lejeune, 2002). Defending against DoS attacks is one of the most challenging security problems on the Internet 
today. A major difficulty in thwarting these attacks is to trace the source of the attack, as they often use incorrect or 
spoofed IP source addresses to disguise the true origin of the attack (Kim and Kim, 2006). 
The United States Computer Emergency Readiness Team defines symptoms of deni al-of-service attacks to include 
(McDowell, 2007): 
• Unusually slow network performance 
• Unavailability of a particular web site 
• Inability to access any web site 
• Dramatic increase in the number of spam emails received 
DoS attacks can be executed in a number of different ways including: 
ICMP Flood (Smurf Attack) – where perpetrators will send large numbers of IP packets with the source address faked to 
appear to be the address of the victim. The network’s bandwidth is quick ly used up, preventing legitimate packets from 
getting through to their destination 
Teardrop Attack – A Teardrop attack involves sending mangled IP fragments with overlapping, over-sized, payloads to 
the target machine. A bug in the TCP/IP fragmentation re-assembly code of various operating systems causes the 
fragments to be improperly handled, crashing them as a result of this. 
Phlashing – Also known as a Permanent denial-of-service (PDoS) is an attack that damages a system so badly that it 
requires replacement or reinstallation of hardware. Perpetrators exploit security flaws in the remote management 
interfaces of the victim’s hardware, be it routers, printers, or other networking hardware. These flaws leave the door 
open for an attacker to remotely ‘update’ the device firmware to a modified, corrupt or defective firmware image, 
therefore bricking the device and making it permanently unusable for its original purpose. 
Distributed Denial-of-Service Attacks - Distributed Denial of Service (DDoS) attacks are one of the greatest security fear 
for IT managers. In a matter of minutes, thousands of vulnerable computers can flood the victim website by choking 
legitimate traffic (Tariq et al., 2006). A distributed denial of service attack (DDoS) occurs when multiple compromised 
systems flood the bandwidth or resources of a targeted system, usually one or more web servers. The most famous 
DDoS attacks occurred in February 2000 where websites including Yahoo, Buy.com, eBay, Amazon and CNN were 
attacked and left unreachable for several hours each (Todd, 2000). 
Brute Force Attacks – A brute force attack is a method of defeating a cryptographic scheme by trying a large number of 
possibilities; for example, a large number of the possible keys in a key space in order to decrypt a message. Brute Force 
Attacks, although perceived to be low-tech in nature are not a thing of the past. In May 2007 the internet infrastructure 
in Estonia was crippled by multiple sustained brute force attacks against government and commercial institut ions in the 
21
country (Sausner, 2008). The attacks followed the relocation of a Soviet World War II memorial in Tallinn in late April 
made news around the world. 
5. Non-Technical Attacks 
Phishing Attacks 
Phishing is the criminally fraudulent process of attempting to acquire sensitive information such as usernames, 
passwords and credit card details, by masquerading as a trustworthy entity in an electronic communication. Phishing 
scams generally are carried out by emailing the victim with a ‘fraudulent’ email from what purports to be a legitimate 
organization requesting sensitive information. When the victim follows the link embedded within the email they are 
brought to an elaborate and sophisticated duplicate of the legitimate organizations website. Phishi ng attacks generally 
target bank customers, online auction sites (such as eBay), online retailers (such as amazon) and services providers (such 
as PayPal). According to community banker (Swann, 2008), in more recent times cybercriminals have got more 
sophisticated in the timing of their attacks with them posing as charities in times of natural disaster. 
Social Engineering 
Social engineering is the art of manipulating people into performing actions or divulging confidential information. Social 
engineering techniques include pretexting (where the fraudster creates an invented scenario to get the victim to divulge 
information), Interactive voice recording (IVR) or phone phishing (where the fraudster gets the victim to divulge 
sensitive information over the phone) and baiting with Trojans horses (where the fraudster ‘baits’ the victim to load 
malware unto a system). Social engineering has become a serious threat to e-commerce security since it is difficult to 
detect and to combat as it involves ‘human’ factors which cannot be patched akin to hardware or software, albeit staff 
training and education can somewhat thwart the attack (Hasle et al., 2005). 
6. Conclusions 
In conclusion the e-commerce industry faces a challenging future in terms of the security risks it must avert. With 
increasing technical knowledge, and its widespread availability on the internet, criminals are becoming more and more 
sophisticated in the deceptions and attacks they can perform. Novel attack strategies and vulnerabilities only really 
become known once a perpetrator has uncovered and exploited them. In saying this, there are multiple security 
strategies which any e-commerce provider can instigate to reduce the risk of attack and compromise significantly. 
Awareness of the risks and the implementation of multi-layered security protocols, detailed and open privacy policies 
and strong authentication and encryption measures will go a long way to assure the consumer and insure the risk of 
compromise is kept minimal. 
What is MySQL? 
22 
 MySQL is a database system used on the web 
 MySQL is a database system that runs on a server 
 MySQL is ideal for both small and large applications 
 MySQL is very fast, reliable, and easy to use 
 MySQL supports standard SQL 
 MySQL compiles on a number of platforms 
 MySQL is free to download and use 
 MySQL is developed, distributed, and supported by Oracle Corporation 
The data in MySQL is stored in tables. A table is a collection of related data, and it consists of columns and 
rows. Databases are useful when storing information categorically. A company may have a database with the 
following tables: 
 Employees 
 Products 
 Customers 
 Orders 
What is PHP? 
 PHP is an acronym for "PHP Hypertext Preprocessor" 
 PHP is a widely-used, open source scripting language
23 
 PHP scripts are executed on the server 
 PHP costs nothing, it is free to download and use 
What is a PHP File? 
 PHP files can contain text, HTML, CSS, JavaScript, and PHP code 
 PHP code are executed on the server, and the result is returned to the browser as plain HTML 
 PHP files have extension ".php" 
What Can PHP Do? 
 PHP can generate dynamic page content 
 PHP can create, open, read, write, delete, and close files on the server 
 PHP can collect form data 
 PHP can send and receive cookies 
 PHP can add, delete, modify data in your database 
 PHP can restrict users to access some pages on your website 
 PHP can encrypt data 
With PHP you are not limited to output HTML. You can output images, PDF files, and even Flash movies. You 
can also output any text, such as XHTML and XML. 
Connecting to and Disconnecting from the Server 
To connect to the server, you will usually need to provide a MySQL user name when you invoke MySQL and, most likely, 
a password. If the server runs on a machine other than the one where you log in, you will also need to specify a host 
name. Contact your administrator to find out what connection parameters you should use to connect (that is, what host, 
user name, and password to use). Once you know the proper parameters, you should be able to connect like this: 
shell> mysql -h host -u user -p 
Enter password: ******** 
host and user represent the host name where your MySQL server is running and the user name of your MySQL account. 
Substitute appropriate values for your setup. The ******** represents your password; enter it when MySQL displays 
the Enter password: prompt. 
If that works, you should see some introductory information followed by a mysql> prompt: 
shell> mysql -h host -u user -p 
Enter password: ******** 
Welcome to the MySQL monitor. Commands end with ; or g. 
Your MySQL connection id is 25338 to server version: 5.0.96-standard 
Type 'help;' or 'h' for help. Type 'c' to clear the buffer. 
mysql> 
The mysql> prompt tells you that mysql is ready for you to enter commands. 
If you are logging in on the same machine that MySQL is running on, you can omit the host, and simply use the 
following: 
shell> mysql -u user -p 
If, when you attempt to log in, you get an error message such as ERROR 2002 (HY000): Can't connect to local MySQL 
server through socket '/tmp/mysql.sock' (2), it means that the MySQL server daemon (Unix) or service (Windows) is not 
running. Consult the administrator that is appropriate to your operating system. 
Some MySQL installations permit users to connect as the anonymous (unnamed) user to the server running on the local 
host. If this is the case on your machine, you should be able to connect to that server by invoking mysql without any 
options:
24 
shell> mysql 
After you have connected successfully, you can disconnect any time by typing QUIT (or q) at the mysql> prompt: 
mysql> QUIT 
Bye 
On Unix, you can also disconnect by pressing Control+D. 
Most examples in the following sections assume that you are connected to the server. They indicate this by 
the mysql> prompt. 
Data type 
In computer science and computer programming, a data type or simply type is a classification identifying one of various 
types of data, such as real, integer or Boolean, that determines the possible values for that type; the operations that can 
be done on values of that type; the meaning of the data; and the way values of that type can be stored. 
In MySQL there are three main types : text, number, and Date/Time types. Refer mysql book (Aplus). 
The Java programming language is statically-typed, which means that all variables must first be declared before they can 
be used. 
All programs involve storing and manipulating data. 
Luckily (???) the computer only knows about a few types of data. These include, numbers, true/false values, characters 
(a,b,c,1,2,3,etc), lists of data, and complex "Structures" of data, which build up new data types by combining the other 
data types. 
Creating & Using Database , getting information about database and table– refer mysql book. 
Batch mode - To run your SQL batch file from the command line, enter the following: 
In Windows: 
mysql < c:commands.sql 
Don’t forget to enc lose the file path in quotes if there are any spac es. 
Running the Batch Job as a Scheduled Task 
In Windows 
Batch jobs can be even more automated by running them as a scheduled task. In Windows, batch files are 
used to execute DOS commands. We can schedule our batch job by placing the command code that we 
entered earlier in a file, suc h as “runsql.bat”. This file will contain only one line: 
mysql < c:commands.sql 
To schedule the batch job: 
1. Open Scheduled Tasks. 
 Click Start, click All Programs, point to Accessories, point to System Tools, and then 
click Scheduled Tasks:
2. Double-click Add Scheduled Task to start the Scheduled Task Wizard, and then click Next in the first 
dialog box. 
25 
3. The next dialog box displays a list of programs that are installed on your computer, either as part of 
the Windows operating system, or as a result of software installation. Click Browse and select your SQL 
file, and then click Open. 
4. Type a name for the task, and then choose when and how often you want the task to run, from one of 
the following options: 
 Daily 
 Weekly 
 Monthly 
 One time only 
 When my computer starts (before a user logs on) 
 When I log on (only after the current user logs on) 
5. Click Next, specify the information about the day and time to run the task, and then click Next. 
6. OPTIONAL: Enter the name and password of the user who is associated with this task. Make sure that 
you choose a user with sufficient permissions to run the program. By default, the wizard selects the name 
of the user who is currently logged on. 
Scheduled Tasks in Windows 
If at a later time you’d like to suspend this task, you c an open it via the Scheduled Tasks dialog (pictured 
above) and deselec t the Enabled c hec kbox on the “Task” tab:
26 
The “Task” Tab Containing the “Enabled” Checkbox 
Similarly, you can remove the task by deleting it like any file. In fact, the task is saved as a .job f ile in the 
WINNTTasks folder. 
Mysql in Cloud 
A database accessible to clients from the cloud and delivered to users on demand via the Internet from a cloud database 
provider's servers. Also referred to as Database-as-a-Service (DBaaS), cloud databases can use cloud computing to 
achieve optimized scaling, high availability, multi-tenancy and effective resource allocation. 
While a cloud database can be a traditional database such as a MySQL or SQL Server database that has been adopted for 
cloud use, a native cloud database such as Xeround's MySQL Cloud database tends to better equipped to optimally use 
cloud resources and to guarantee scalability as well as availability and stability. 
Cloud databases can offer significant advantages over their traditional counterparts, including increased accessibility, 
automatic failover and fast automated recovery from failures, automated on-the-go scaling, minimal investment and 
maintenance of in-house hardware, and potentially better performance. At the same time, cloud databases have their 
share of potential drawbacks, including security and privacy issues as well as the potential loss of or inability to access 
critical data in the event of a disaster or bankruptcy of the cloud database service provider.

More Related Content

What's hot

Introduction to Database Management System
Introduction to Database Management SystemIntroduction to Database Management System
Introduction to Database Management System
Hitesh Mohapatra
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented database
Kanike Krishna
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
ArangoDB Database
 

What's hot (20)

Introduction to Database Management System
Introduction to Database Management SystemIntroduction to Database Management System
Introduction to Database Management System
 
Introduction to Mobile Business Intelligence
Introduction to Mobile Business IntelligenceIntroduction to Mobile Business Intelligence
Introduction to Mobile Business Intelligence
 
Overview of Storage and Indexing ...
Overview of Storage and Indexing                                             ...Overview of Storage and Indexing                                             ...
Overview of Storage and Indexing ...
 
Comparison of Relational Database and Object Oriented Database
Comparison of Relational Database and Object Oriented DatabaseComparison of Relational Database and Object Oriented Database
Comparison of Relational Database and Object Oriented Database
 
Doing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookupDoing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookup
 
database and database types
database and database typesdatabase and database types
database and database types
 
Rdbms
RdbmsRdbms
Rdbms
 
File organization and indexing
File organization and indexingFile organization and indexing
File organization and indexing
 
Dbms database models
Dbms database modelsDbms database models
Dbms database models
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented database
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
 
Databases
DatabasesDatabases
Databases
 
Role of a DBA
Role of a DBARole of a DBA
Role of a DBA
 
Presentation on Database management system
Presentation on Database management systemPresentation on Database management system
Presentation on Database management system
 
Mis assignment (database)
Mis assignment (database)Mis assignment (database)
Mis assignment (database)
 
Sql Server Basics
Sql Server BasicsSql Server Basics
Sql Server Basics
 
Managing data resources
Managing  data resourcesManaging  data resources
Managing data resources
 
Introduction to Database Management System
Introduction to Database Management SystemIntroduction to Database Management System
Introduction to Database Management System
 

Similar to Database Management Systems (Mcom Ecommerce)

Data base management system
Data base management systemData base management system
Data base management system
Navneet Jingar
 
Uses of dbms
Uses of dbmsUses of dbms
Uses of dbms
MISY
 
Chapter 5 data processing
Chapter 5 data processingChapter 5 data processing
Chapter 5 data processing
UMaine
 

Similar to Database Management Systems (Mcom Ecommerce) (20)

Database Management Systems
Database Management SystemsDatabase Management Systems
Database Management Systems
 
Database and Database Management (DBM): Health Informatics
Database and Database Management (DBM): Health InformaticsDatabase and Database Management (DBM): Health Informatics
Database and Database Management (DBM): Health Informatics
 
Lecture#5
Lecture#5Lecture#5
Lecture#5
 
jose rizal
jose rizaljose rizal
jose rizal
 
Dbms
DbmsDbms
Dbms
 
Data base management system
Data base management systemData base management system
Data base management system
 
Database management system
Database management systemDatabase management system
Database management system
 
Uses of dbms
Uses of dbmsUses of dbms
Uses of dbms
 
DBMS PART 1.docx
DBMS PART 1.docxDBMS PART 1.docx
DBMS PART 1.docx
 
Ch-1-Introduction-to-Database.pdf
Ch-1-Introduction-to-Database.pdfCh-1-Introduction-to-Database.pdf
Ch-1-Introduction-to-Database.pdf
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 
Relational database management systems
Relational database management systemsRelational database management systems
Relational database management systems
 
Dbms unit i
Dbms unit iDbms unit i
Dbms unit i
 
database introductoin optimization1-app6891.pdf
database introductoin optimization1-app6891.pdfdatabase introductoin optimization1-app6891.pdf
database introductoin optimization1-app6891.pdf
 
Introduction to Database
Introduction to DatabaseIntroduction to Database
Introduction to Database
 
Components and Advantages of DBMS
Components and Advantages of DBMSComponents and Advantages of DBMS
Components and Advantages of DBMS
 
Chapter 05 pertemuan 7- donpas - manajemen data
Chapter 05 pertemuan 7- donpas - manajemen dataChapter 05 pertemuan 7- donpas - manajemen data
Chapter 05 pertemuan 7- donpas - manajemen data
 
Chapter 5 data processing
Chapter 5 data processingChapter 5 data processing
Chapter 5 data processing
 
Unit 1.pptx
Unit 1.pptxUnit 1.pptx
Unit 1.pptx
 
Dbms notes
Dbms notesDbms notes
Dbms notes
 

Recently uploaded

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Recently uploaded (20)

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 

Database Management Systems (Mcom Ecommerce)

  • 1. Meaning A database is an organized collection of data. The data are typically organized to model aspects of reality in a way that supports processes requiring information. For example, modeling the availability of rooms in hotels in a way that supports finding a hotel with vacancies. Database management systems (DBMSs) are specially designed software applications that interact with the user, other applications, and the database itself to capture and analyze data. A general -purpose DBMS is a software system designed to allow the definition, creation, querying, update, and administration of databases. Well-known DBMSs include MySQL, PostgreSQL, Microsoft SQL Server, Oracle, SAP and IBM DB2. A database is not generally portable across different DBMSs, but different DBMSs can interoperate by using standards such as SQL and ODBC or JDBC to allow a single application to work with more than one DBMS. Database management systems are often classified according to the database that they support; the most popular database systems since the 1980s have all supported the relational model as represented by the SQL language. Systematically organized or structured repository of indexed information (usually as a group of linked data files) that allows easy retrieval, updating, analysis, and output of data. Stored usually in a computer, this data could be in the form of graphics, reports, scripts, tables, text, etc., representing almost every kind of information. Most computer applications (including software, spreadsheets, word-processors) are databases at their core. See also flat database and relational database. A database is a collection of information that is organized so that it can easily be accessed, managed, and updated. In one view, databases can be classified according to types of content: bibliographic, full -text, numeric, and images. A database is a collection of information that is organized so that it can easily be accessed, managed, and updated. In one view, databases can be classified according to types of content: bibliographic, full -text, numeric, and images. In computing, databases are sometimes classified according to their organizational approach. The most prevalen t approach is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses. Computer databases typically contain aggregations of data records or files, such as sales transactions, product catalogs and inventories, and customer profiles. Typically, a database manager provides users the capabilities of controlling read/write access, specifying report generation, and analyzing usage. Databases and database managers are prevalent in large main frame systems, but are also present in smaller distributed workstation and mid-range systems such as the AS/400 and on personal computers. SQL(Structured Query Language) is a standard language for making interactive queries from and updating a database such as IBM's DB2, Microsoft's SQL Server, and database products from Oracle, Sybase, and Computer Associates. Features of a DBMS The prime purpose of a relational database management system is to maintain data integrity. This means all the rules and relationships between data are consistent at all times. But a good DBMS will have other features as well. These include: A command language that allows you to create, delete and alter the database (data description language or DDL) A way of documenting all the internal structures that makes up the database (data dictionary) A language to support the manipulation and processing of the data (data manipulation language) Support the ability to view the database from different viewpoints according to the requirements of the user Provide some level of security and access control to the data. The simplest RDBMS may be designed with a single user in mind e.g. the database is 'locked' until that person has finished with it. Such a RDBMS will only cost a few hundred pounds at most and will have only a basic capability. On the other hand an enterprise level DBMS can support a huge number of simultaneous users with thousands of internal tables and complex 'roll back' capabilities should things go wrong. Obviously this kind of system will cost thousands along with a need to have professional database administrators looking after it and database specialists to create complex queries for management and staff. 1
  • 2. 1. Controlling Data Redundancy: In non-database systems (traditional computer file processing), each application program has its own files. In this case, the duplicated copies of the same data are created at many places. In DBMS, all the data of an organization is integrated into a single database. The data is recorded at only one place in the database and it is not duplicated. For example, the dean's faculty file and the faculty payroll file contain several items that are identical. When they are converted into database, the data is integrated into a single database so that multiple copies of the same data are reduced to-single copy. In DBMS, the data redundancy can be controlled or reduced but is not removed completely. Sometimes, it is necessary to create duplicate copies of the same data items in order to relate tables with each other. By controlling the data redundancy, you can save storage space. Similarly, it is useful for retrieving data from database using queries. 2. Data Consistency: By controlling the data redundancy, the data consistency is obtained. If a data item appears only once, any update to its value has to be performed only once and the updated value (new value of item) is immediately available to all users. If the DBMS has reduced redundancy to a minimum level, the database system enforces consistency. It means that when a data item appears more than once in the database and is updated, the DBMS automatically updates each occurrence of a data item in the database. 3. Data Sharing: In DBMS, data can be shared by authorized users of the organization. The DBA manages the data and gives rights to users to access the data. Many users can be authorized to access the same set of information simultaneously. The remote users can also share same data. Similarly, the data of same database can be shared between different application programs. 4. Data Integration: In DBMS, data in database is stored in tables. A single database contains multiple tables and relationships can be created between tables (or associated data entities). This makes easy to retrieve and update data. 5. Integrity Constraints: Integrity constraints or consistency rules can be applied to database so that the correct data can be entered into database. The constraints may be applied to data item within a single record or they may be applied to relationships between records. Examples: The examples of integrity constraints are: (i) 'Issue Date' in a library system cannot be later than the corresponding 'Return Date' of a book. (ii) Maximum obtained marks in a subject cannot exceed 100. (iii) Registration number of BCS and MCS students must start with 'BCS' and 'MCS' respectively etc. There are also some standard constraints that are intrinsic in most of the DBMSs. These are; 2 Constraint Name Description PRIMARY KEY Designates a column or combination of columns as Primary Key and therefore, values of columns cannot be repeated or left blank. FOREIGN KEY Relates one table with another table. UNIQUE Specifies that values of a column or combination of columns cannot be repeated. NOT NULL Specifies that a column cannot contain empty values. CHECK Specifies a condition which each row of a table must satisfy. Most of the DBMSs provide the facility for applying the integrity constraints. The database designer (or DBA) identifies integrity constraints during database design. The application programmer can also identify integrity constraints in the program code during developing the application program. The integrity constraints are automatically checked at the time of data entry or when the record is updated. If the data entry operator (end-user) violates an integrity constraint,
  • 3. the data is not inserted or updated into the database and a message is displayed by the system. For example, when you draw amount from the bank through ATM card, then your account balance is compared with the amount you are drawing. If the amount in your account balance is less than the amount you want to draw, then a message is displayed on the screen to inform you about your account balance. 6. Data Security: Data security is the protection of the database from unauthorized users. Only the authorized persons are allowed to access the database. Some of the users may be allowed to access only a part of database i.e., the data that is related to them or related to their department. Mostly, the DBA or head of a department can access all the data in the database. Some users may be permitted only to retrieve data, whereas others are allowed to retrieve as well as to update data. The database access is controlled by the DBA. He creates the accounts of users and gives rights to access the database. Typically, users or group of users are given usernames protected by passwords. Most of the DBMSs provide the security sub-system, which the DBA uses to create accounts of users and to specify account restrictions. The user enters his/her account number (or username) and password to access the data from database. For example, if you have an account of e-mail in the "hotmail.com" (a popular website), then you have to give your correct username and password to access your account of e-mail. Similarly, when you insert your ATM card into the Auto Teller Machine (ATM) in a bank, the machine reads your ID number printed on the card and then asks you to enter your pin code (or password). In this way, you can access your account. 7. Data Atomicity: A transaction in commercial databases is referred to as atomic unit of work. For example, when you purchase something from a point of sale (POS) terminal, a number of tasks are performed such as; Company stock is updated. Amount is added in company's account. Sales person's commission increases etc. All these tasks collectively are called an atomic unit of work or transaction. These tasks must be completed in all; otherwise partially completed tasks are rolled back. Thus through DBMS, it is ensured that only consistent data exists within the database. 8. Database Access Language: Most of the DBMSs provide SQL as standard database access language. It is used to access data from multiple tables of a database. 9. Development of Application: The cost and time for developing new applications is also reduced. The DBMS provides tools that can be used to develop application programs. For example, some wizards are available to generate Forms and Reports. Stored procedures (stored on server side) also reduce the size of application programs. 10. Creating Forms: Form is very important object of DBMS. You can create Forms very easily and quickly in DBMS, Once a Form is created, it can be used many times and it can be modified very easily. The created Forms are also saved along with database and behave like a software component. A Form provides very easy way (user-friendly interface) to enter data into database, edit data, and display data from database. The non-technical users can also perform various operations on databases through Forms without going into the technical details of a database. 11. Report Writers: Most of the DBMSs provide the report writer tools used to create reports. The users can create reports very easily and quickly. Once a report is created, it can be used many times and it can be modified very easily. The created re ports are also saved along with database and behave like a software component. 12. Control Over Concurrency: In a computer file-based system, if two users are allowed to access data simultaneously, it is possible that they will interfere with each other. For example, if both users attempt to perform update operation on the same record, then one may overwrite the values recorded by the other. Most DBMSs have sub-systems to control the concurrency so that transactions are always recorded" with accuracy. 3
  • 4. 13. Backup and Recovery Procedures: In a computer file-based system, the user creates the backup of data regularly to protect the valuable data from damaging due to failures to the computer system or application program. It is a time consuming method, if volume of data is large. Most of the DBMSs provide the 'backup and recovery' sub-systems that automatically create the backup of data and restore data if required. For example, if the computer system fails in the middle (or end) of an update operation of the program, the recovery sub-system is responsible for making sure that the database is restored to the state it was in before the program started executing. 14. Data Independence: The separation of data structure of database from the application program that is used to access data from database is called data independence. In DBMS, database and application programs are separated from each other. The DBMS sits in between them. You can easily change the structure of database without modifying the application program. For example you can modify the size or data type of a data items (fields of a database table). On the other hand, in computer file-based system, the structure of data items are built into the individual application programs. Thus the data is dependent on the data file and vice versa. 15. Advanced Capabilities: DBMS also provides advance capabilities for online access and reporting of data through Internet. Today, most of the database systems are online. The database technology is used in conjunction with Internet technology to access data on the web servers. Data Base Management Systems Architecture Data Base Management Systems (DBMS) are very relevant in today’s world where information matters. Most business operations of large companies are dependent on their databases in some way or the other. Many companies use their data analysis methods to leverage the data in their databases and provide better service to customers and compete with their business rivals. Databases are collections of data that has been organized in a certain way. The term DBMS is a commonly used to refer to computer program that can help you store, change and retrieve the data in your database. Most DBMS software products use SQL as the main query language – the language that lets you interact with and extract results from your database quickly. SQL is the language used to query popular database systems like Oracle, SQL Server and MySQL. Learning SQL and DBMS can help you become a database administrator. DBMS Architecture DBMS architecture is the way in which the data in a database is viewed (or represented to) by users. It helps you represent your data in an understandable way to the users, by hiding the complex bits that deal with the working of the system. Remember, DBMS architecture is not about how the DBMS software operates or how it processes data. We’re going to take a look at the ANSI-SPARC DBMS standard model. ANSI is the acronym for American National Standards Institute. It sets standards for American goods so that they can be used anywhere in the world without compatibility problems. In the case of DBMS software, ANSI has standardized SQL, so that most DBMS products use SQL as the main query language. The ANSI has also standardized a three level DBMS architecture model followed by most database systems, and it’s known as the abstract ANSI-SPARC design standard. The ANSI-SPARC Database Architecture is set up into three tiers. Let’s take a closer look at them. The Internal Level (Physical Representation of Data) : The internal level is the lowest level in a three tiered database. This level deals with how the stored data on the database is represented to the user. This level shows exactly how the data is stored and organized for access on your system. This is the most technical of the three levels. However, the internal level view is still abstract –even if it shows how the data is stored physically, it will not show how the database software operates on it. So how exactly is data stored on this level? There are several considerations to be made when storing data. Some of them include figuring out the right space allocation techniques, data compression techniques (if necessary), security and encryption and the access paths the software can take to retrieve the data. Most DBMS software products make sure that data access is optimized and that data uses minimum storage space. The OS you’re running is actually in charge of managing the physical storage space. 4
  • 5. The Conceptual Level (Holistic Representation of Data) : The conceptual level tells you how the database was structured logically. This level tells you about the relationship between the data members of your database, exactly what data is stored in it and what a user will need to use the database. This level does not concern itself with how this logical structure will actually be implemented. It’s actually an overview of your database. The conceptual level acts as a sort of a buffer between the internal level and the external level. It helps hide the complexity of the database and hides how the data is physically stored in it. The database administrator will have to be conversant with this layer, because most of his operations are carried out on it. Only a database administrator is allowed to modify or structure this level. It provides a global view of the database, as well as the hardware and software necessary for running it – all important info for a database admin. The External Level (User Representation of Data) : This is the uppermost level in the database. It implements the concept of abstraction as much as possible. This level is also known as the view level because it deals with how a user views your database. The external level is what allows a user to access a customized version of the data in your database. Multiple users can work on a database on the same time because of it. The external level also hides the working of the database from your users. It maintains the security of the database by giving users access only to the data which they need at a particular time. Any data that is not needed will not be displayed. Three “schemas” (internal, conceptual and external) show how the database is internally and externally structured, and so this type of database architecture is also known as the “three -schema” architecture. Functional dependency A functional dependency occurs when one attribute in a relation uniquely determines another attribute. This can be written A -> B which would be the same as stating "B is functionally dependent upon A." Examples: In a table listing employee characteristic including Social Security Number (SSN) and name, it can be said that name is functionally dependent upon SSN (or SSN -> name) because an employee's name can be uniquely determined from their SSN. However, the reverse statement (name -> SSN) is not true because more than one employee can have the same name but different SSNs. Definition - What does Functional Dependency mean? Functional dependency is a relationship that exists when one attribute uniquely determines another att ribute. If R is a relation with attributes X and Y, a functional dependency between the attributes is represented as X->Y, which specifies Y is functionally dependent on X. Here X is a determinant set and Y is a dependent attribute. Each value of X is associated precisely with one Y value. Functional dependency in a database serves as a constraint between two sets of attributes. Defining functional dependency is an important part of relational database design and contributes to aspect normalization. What is Normalization? Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant (unwanted) data (for example, storing the same data in more than one table) and ensuring dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. Techopedia - Normalization is the process of reorganizing data in a database so that it meets two basic requirements: (1) There is no redundancy of data (all data is stored in only one place), and (2) data dependencies are logical (all related data items are stored together). Normalization is important for many reasons, but chiefly because it allows databases to take up as little disk space as possible, resulting in increased performance. Normalization is also known as data normalization. The Normal Forms The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll of ten see 1NF , 2NF , 5
  • 6. and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen and won't be discussed in this article. Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guideline s only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take place, it's extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies. That said, let's explore the normal forms. First Normal Form (1NF) First normal form (1NF) sets the very basic rules for an organized database: Eliminate duplicative columns from the same table. Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key). Second Normal Form (2NF) Second normal form (2NF) further addresses the concept of removing duplicative data: Meet all the requirements of the first normal form. Remove subsets of data that apply to multiple rows of a table and place them in separate tables. Create relationships between these new tables and their predecessors through the use of foreign keys. Third Normal Form (3NF) Third normal form (3NF) goes one large step further: Meet all the requirements of the second normal form. Remove columns that are not dependent upon the primary key. Boyce-Codd Normal Form (BCNF or 3.5NF) The Boyce-Codd Normal Form also referred to as the "third and half (3.5) normal form", adds one more requirement: Meet all the requirements of the third normal form. Every determinant must be a candidate key. Fourth Normal Form (4NF) Finally, fourth normal form (4NF) has one additional requirement: Meet all the requirements of the third normal form. A relation is in 4NF if it has no multi-valued dependencies. Remember, these normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database. Data Models E- R Model is a graphical representation of entities and their relationships to each other, typically used in computing in regard to the organization of data within databases or information systems. An entity is a piece of data-an object or concept about which data is stored. A relationship is how the data is shared between entities. There are three types of relationships between entities: 1. One-to-One One instance of an entity (A) is associated with one other instance of another entity (B). For example, in a database of employees, each employee name (A) is associated with only one social security number (B). 2. One-to-Many One instance of an entity (A) is associated with zero, one or many instances of another entity (B), but for one instance of entity B there is only one instance of entity A. For example, for a company with all employees working in one building, the building name (A) is associated with many different employees (B), but those employees all share the same singular association with entity A. 3. Many-to-Many One instance of an entity (A) is associated with one, zero or many instances of another entity (B), and one instance of entity B is associated with one, zero or many instances of entity A. For example, for a company in which all of its 6
  • 7. employees work on multiple projects, each instance of an employee (A) is associated with many instances of a project (B), and at the same time, each instance of a project (B) has multiple employees (A) associated with it. Relational Model The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F. Codd. In the relational model of a database, all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database. The purpose of the relational model is to provide a declarative method for specifying data and queries: users directly state what information the database contains and what information they want from it, and let the database management system software take care of describing data structures for storing the data and retrieval procedures for answering queries. Most relational databases use the SQL data definition and query language; these systems implement what can be regarded as an engineering approximation to the relational model. A table in an SQL database schema corresponds to a predicate variable; the contents of a table to a relation; key constraints, other constraints, and SQL queries correspond to predicates. However, SQL databases deviate from the relational model in many details, and Codd fiercely argued against deviations that compromise the original principles. 7 Diagram of an example database according to the Relational model.
  • 8. 8 In the relational model, related records are linked together with a "key". Network model The Network model replaces the hierarchical tree with a graph thus allowing more general connections among the nodes. The main difference of the network model from the hierarchical model, is its ability to handle many to many (N: N) relations. In other words, it allows a record to have more than one parent. Suppose an employee works for two departments. The strict hierarchical arrangement is not possible here and the tree becomes a more generalized graph - a network. The network model was evolved to specifically handle non-hierarchical relationships. As shown below data can belong to more than one parent. Note that there are lateral connections as well as top-down connections. A network structure thus allows 1:1 (one: one), l: M (one: many), M: M (many: many) relationships among entities. In network database terminology, a relationship is a set. Each set is made up of at least two types of records: an owner record (equivalent to parent in the hierarchical model) and a member record (similar to the child record in the hierarchical model). The database of Customer-Loan, which we discussed earlier for hierarchical model, is now represented for Network model as shown. It can easily depict that now the information about the joint loan L1 appears single time, but in case of hierarchical model it appears for two times. Thus, it reduces the redundancy and is better as compared to hierarchical model. Hierarchical Model The Hierarchical Data Model is a way of organizing a database with multiple one to many relationships. The structure is based on the rule that one parent can have many children but children are allowed only one parent. This structure allows information to be repeated through the parent child relations created by IBM and was implemented mainly in their Information Management System. (IMF), the precursor to the DBMS. A hierarchical database model is a data model in which the data is organized into a tree-like structure. The data is stored as records which are connected to one another through links. A record is a collection of fields, with each field containing only one value. The entity type of a record defines which fields the record contains. A record in the hierarchical database model corresponds to a row (or tuple) in the relational database model and an entity type corresponds to a table (or relation). The hierarchical database model mandates that each child record has only one parent, whereas each parent record can have one or more child records. In order to retrieve data from a
  • 9. hierarchical database the whole tree needs to be traversed starting from the root node. This model is recognized as the first database model created by IBM in the 1960 Distributed database A distributed database is a database that is under the control of a central database management system (DBMS) in which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers. Collections of data (e.g. in a database) can be distributed across multiple physical locations. A distributed database can reside on network servers on the Internet, on corporate intranets or extranets, or on other company networks. The replication and distribution of databases improves database performance at end-user worksites. To ensure that the distributive databases are up to date and current, there are two processes: replication and duplication. Replication involves using specialized software that looks for changes in the distributive database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be very complex and time consuming depending on the size and number of the distributive databases. This process can also require a lot of time and computer resources. Duplication on the other hand is not as complicated. It basically identifies one database as a master and then duplicates that database. The duplication process is normally done at a set time after hours. This is to ensure that each distributed location has the same data. In the duplication process, changes to the master database only are allowed. This is to ensure that local data will not be overwritten. Both of the processes can keep the data current in all distributive locations. Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These technologies' implementation can and does depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in the database, and hence the price the business is wil ling to spend on ensuring data security, consistency and integrity. Object oriented database An object database (also object-oriented database management system) is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which are table-oriented. Object-relational databases are a hybrid of both approaches. Object databases have been considered since the early 1980s. Object-oriented database management systems (OODBMSs) combine database capabilities with object-oriented programming language capabilities. OODBMSs allow object-oriented programmers to develop the product, store them as objects, and replicate or modify existing objects to make new objects within the OODBMS. Because the database is 9
  • 10. integrated with the programming language, the programmer can maintain consistency within one environment, in that both the OODBMS and the programming language will use the same model of representation. Relational DBMS projects, by way of contrast, maintain a clearer division between the database model and the application. As the usage of web-based technology increases with the implementation of Intranets and extranets, companies have a vested interest in OODBMSs to display their complex data. Using a DBMS that has been specifically designed to store data as objects gives an advantage to those companies that are geared towards multimedia presentation or organizations that utilize computer-aided design (CAD). Some object-oriented databases are designed to work well with object-oriented programming languages such as Delphi, Ruby, Python, Perl, Java, C#, Visual Basic .NET, C++,Objective-C and Smalltalk; others have their own programming languages. OODBMSs use exactly the same model as object-oriented programming languages. Spatial database A spatial database is a database that is optimized to store and query data that represents objects defined in a geometric space. Most spatial databases allow representing simple geometric objects such as points, li nes and polygons. Some spatial databases handle more complex structures such as 3D objects, topological coverage’s, linear networks, and TINs. While typical databases are designed to manage various numeric and character types of data, additional functionality needs to be added for databases to process spatial data types efficiently. These are typically called geometry or feature. The Open Geospatial Consortium created the Simple Features specification and sets standards for adding spatial functionality to database systems. Multimedia database A Multimedia database (MMDB) is a collection of related multimedia data. The multimedia data include one or more primary media data types such as text, images, graphicobjects (including drawings, sketches and illustrations) animation sequences, audio and video. A Multimedia Database Management System (MMDBMS) is a framework that manages different types of data potentially represented in a wide diversity of formats on a wide array of media sources. It provides support for multimedia data types, and facilitate for creation, storage, access, query and control of a multimedia database. Crash Recovery System Though we are living in highly technologically advanced era where hundreds of satellite monitor the earth and at every second billions of people are connected through information technology, failure is expected but not every time acceptable. DBMS is highly complex system with hundreds of transactions being executed every second. Availability of DBMS depends on its complex architecture and underlying hardware or system software. If it fails or crashes amid transactions being executed, it is expected that the system would follow some sort of algorithm or techniques to recover from crashes or failures. Failure Classification To see where the problem has occurred we generalize the failure into various categories, as follows: TRANSACTION FAILURE When a transaction is failed to execute or it reaches a point after which it cannot be completed successfully it has to abort. This is called transaction failure. Where only few transaction or process are hurt. Reason for transaction failure could be: Logical errors: where a transaction cannot complete because of it has some code error or any internal error condition System errors: where the database system itself terminates an active transaction because DBMS is not able to execute it or it has to stop because of some system condition. For example, in case of deadlock or resource unavailability systems aborts an active transaction. 10
  • 11. SYSTEM CRASH There are problems, which are external to the system, which may cause the system to stop abruptly and cause the system to crash. For example interruption in power supply, failure of underlying hardware or software failure. Examples may include operating system errors. DISK FAILURE: In early days of technology evolution, it was a common problem where hard disk drives or storage drives used to fail frequently. Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other failu re, which destroys all or part of disk storage Storage Structure We have already described storage system here. In brief, the storage structure can be divided in various categories: Volatile storage: As name suggests, this storage does not survive system crashes and mostly placed very closed to CPU by embedding them onto the chipset itself for examples: main memory, cache memory. They are fast but can store a small amount of information. Nonvolatile storage: These memories are made to survive system crashes. They are huge in data storage capacity but slower in accessibility. Examples may include, hard disks, magnetic tapes, flash memory, non-volatile (battery backed up) RAM. Recovery and Atomicity When a system crashes, it many have several transactions being executed and various files opened for them to modifying data items. As we know that transactions are made of various operations, which are atomic in nature. But according to ACID properties of DBMS, atomicity of transactions as a whole must be maintained that is, either all operations are executed or none. When DBMS recovers from a crash it should maintain the following: It should check the states of all transactions, which were being executed. A transaction may be in the middle of some operation; DBMS must ensure the atomicity of transaction in this case. It should check whether the transaction can be completed now or needs to be rolled back. No transactions would be allowed to left DBMS in inconsistent state. There are two types of techniques, which can help DBMS in recovering as well as maintaining the atomicity of transaction: Maintaining the logs of each transaction, and writing them onto some stable storage before actually modifying the database. Maintaining shadow paging, where the changes are done on a volatile memory and later the actual database is updated. Log-Based Recovery Log is a sequence of records, which maintains the records of actions performed by a transaction. It is important that the logs are written prior to actual modification and stored on a stable storage media, which is failsafe. Log based recovery works as follows: The log file is kept on stable storage media When a transaction enters the system and starts execution, it writes a log about it <Tn, Start> When the transaction modifies an item X, it write logs as follows: <Tn, X, V1, V2> It reads Tn has changed the value of X, from V1 to V2. When transaction finishes, it logs: <Tn, commit> Database can be modified using two approaches: Deferred database modification: All logs are written on to the stable storage and database is updated when transaction commits. 11
  • 12. Immediate database modification: Each log follows an actual database modification. That is, database is modified immediately after every operation. Recovery with concurrent transactions When more than one transaction is being executed in parallel, the logs are interleaved. At the time of recovery it would become hard for recovery system to backtrack all logs, and then start recovering. To ease this situation most modern DBMS use the concept of 'checkpoints'. CHECKPOINT Keeping and maintaining logs in real time and in real environment may fill out all the memory space available in the system. At time passes log file may be too big to be handled at all. Checkpoint is a mechanism where all the previous logs are removed from the system and stored permanently in storage disk. Checkpoint declares a point before which the DBMS was in consistent state and all the transactions were committed. RECOVERY When system with concurrent transaction crashes and recovers, it does behave in the following manner: [Image: Recovery with concurrent transactions] The recovery system reads the logs backwards from the end to the last Checkpoint. It maintains two lists, undo-list and redo-list. If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it puts the transaction in redo-list. If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the transaction in undo-list. All transactions in undo-list are then undone and their logs are removed. All transaction in redo-list, their previous logs are removed and then redone again and log saved Database security / Authorization concerns the use of a broad range of information security controls to protect databases (potentially including the data, the database applications or stored functions, the database systems, the database servers and the associated network links) against compromises of their confidentiality, integrity and availability. It involves various types or categories of controls, such as technical, procedural/administrative and physical. Database security is a specialist topic within the broader realms of computer security, information security and risk management. Security risks to database systems include, for example: Unauthorized or unintended activity or misuse by authorized database users, database administrators, or network/systems managers, or by unauthorized users or hackers (e.g. inappropriate access to sensitive data, metadata or functions within databases, or inappropriate changes to the database programs, structures or security configurations); Malware infections causing incidents such as unauthorized access, leakage or disclosure of personal or proprietary data, deletion of or damage to the data or programs, interruption or denial of authorized access to the database, attacks on other systems and the unanticipated failure of database services; 12
  • 13. Overloads, performance constraints and capacity issues resulting in the inability of authorized users to use databases as intended; Physical damage to database servers caused by computer room fires or floods, overheating, lightning, accidental liquid spills, static discharge, electronic breakdowns/equipment failures and obsolescence; Design flaws and programming bugs in databases and the associated programs and systems, creating various security vulnerabilities (e.g. unauthorized privilege escalation), data loss/corruption, performance degradation etc.; Data corruption and/or loss caused by the entry of invalid data or commands, mistakes in database or system administration processes, sabotage/criminal damage etc. Many layers and types of information security control are appropriate to databases, including: Access control Auditing Authentication Encryption Integrity controls Backups Application security Database Security applying Statistical Method Traditionally databases have been largely secured against hackers through network security measures such as firewalls, and network-based intrusion detection systems. While network security controls remain valuable in this regard, securing the database systems themselves, and the programs/functions and data within them, has arguably become more critical as networks are increasingly opened to wider access, in particular access from the Internet. Furthermore, system, program, function and data access controls, along with the associated user identification, authentication and rights management functions, have always been important to limit and in some cases log the activities of authorized users and administrators. In other words, these are complementary approaches to database security, working from both the outside-in and the inside-out as it were. Many organizations develop their own "baseline" security standards and designs detailing basic security control measures for their database systems. These may reflect general information security requirements or obligations imposed by corporate information security policies and applicable laws and regulations (e.g. concerning privacy, financial management and reporting systems), along with generally accepted good database security practices (such as appropriate hardening of the underlying systems) and perhaps security recommendations from the relevant database system and software vendors. The security designs for specific database systems typically specify further security administration and management functions (such as administration and reporting of user access rights, log management and analysis, database replication/synchronization and backups) along with various business-driven information security controls within the database programs and functions (e.g. data entry validation and audit trails). Furthermore, various security-related activities (manual controls) are normally incorporated into the procedures, guidelines etc. relating to the design, development, configuration, use, management and maintenance of databases. Data Warehouse Architecture Different data warehousing systems have different structures. Some may have an ODS (operational data store), while some may have multiple data marts. Some may have a small number of data sources, while some may have dozens of data sources. In view of this, it is far more reasonable to present the different layers of a data warehouse architecture rather than discussing the specifics of any one system. In general, al l data warehouse systems have the following layers:  Data Source Layer  Data Extraction Layer  Staging Area  ETL Layer  Data Storage Layer 13
  • 14. 14  Data Logic Layer  Data Presentation Layer  Metadata Layer  System Operations Layer Data Source Layer This represents the different data sources that feed data into the data warehouse. The data source can be of any format -- plain text file, relational database, other types of database, Excel file, etc., can all act as a data source. Many different types of data can be a data source: -- such as sales data, HR data, product data, inventory data, marketing data, systems data. -party data, such as census data, demographics data, or survey data. All these data sources together form the Data Source Layer. Data Extraction Layer Data gets pulled from the data source into the data warehouse system. There is likely some minimal data cleansing, but there is unlikely any major data transformation. Staging Area This is where data sits prior to being scrubbed and transformed into a data warehouse / data mart. Having one common area makes it easier for subsequent data processing / integration. ETL Layer This is where data gains its "intelligence", as logic is applied to transform the data from a transactional nature to an analytical nature. This layer is also where data cleansing happens. The ETL design phase is often the most time-consuming phase in a data warehousing project, and an ETL tool is often used in this layer. Data Storage Layer This is where the transformed and cleansed data sit. Based on scope and functionality, 3 types of entities can be found here: data warehouse, data mart, and operational data store (ODS). In any given system, you may have just one of the three, two of the three, or all three types. Data Logic Layer This is where business rules are stored. Business rules stored here do not affect the underlying data transformation rules, but do affect what the report looks like. Data Presentation Layer This refers to the information that reaches the users. This can be in a form of a tabular / graphical report in a browser, an emailed report that gets automatically generated and sent every day, or an alert that warns users of exceptions, among others. Usually an tool and/or a reporting tool is used in this layer. Metadata Layer This is where information about the data stored in the data warehouse system is stored. A logical data model would be an example of something that's in the metadata layer. Ametadata tool is often used to manage metadata. System Operations Layer This layer includes information on how the data warehouse system operates, such as ETL job status, system performance, and user access history. Evolution of data warehousing In the 1990's as organizations of scale began to need more timely data about their business, they found that traditional information systems technology was simply too cumbersome to provide relevant data efficiently and quickly. Completing reporting requests could take days or weeks using antiquated reporting tools that were designed more or
  • 15. less to 'execute' the business rather than 'run' the business. From this idea, the data warehouse was born as a place where relevant data could be held for completing s trategic reports for management. The key here is the word 'strategic' as most executives were less concerned with the day to day operations than they were with a more overall look at the model and business functions. As with all technology, over the course of the latter half of the 20th century, we saw increased numbers and types of databases. Many large businesses found themselves with data scattered across multiple platforms and variations of technology, making it almost impossible for any one individual to use data from multiple sources. A key idea within data warehousing is to take data from multiple platforms/technologies (As varied as spreadsheets, DB2 databases, IDMS records, and VSAM files) and place them in a common location that uses a common querying tool. In this way operational databases could be held on whatever system was most efficient for the operational business, while the reporting / strategic information could be held in a common location using a common language. Data Warehouses take this even a step farther by giving the data itself commonality by defining what each term means and keeping it standard. (An example of this would be gender which can be referred to in many ways, but should be standardized on a data warehouse with one common way of referring to each sex). All of this was designed to make decision support more readily available and without affecting day to day operations. One aspect of a data warehouse that should be stressed is that it is NOT a location for ALL of a businesses data, but rather a location for data that is 'interesting'. Data that is interesting will assist decision makers in making strategic decisions relative to the organization's overall mission. Benefits of Data Warehousing The successful implementation of a data warehouse can bring major, benefits to an organization including: • Potential high returns on investment - Implementation of data warehousing by an organization requires a huge investment typically from Rs 10 lac to 50 lacs. However, a study by the International Data Corporation (IDC) in 1996 reported that average three-year returns on investment (RO I) in data warehousing reached 401%. • Competitive advantage - The huge returns on investment for those companies that have successfully implemented a data warehouse is evidence of the enormous competitive advantage that accompanies this technology. The competitive advantage is gained by allowing decision-makers access to data that can reveal previously unavailable, unknown, and untapped information on, for example, customers, trends, and demands. • Increased productivity of corporate decision-makers - Data warehousing improves the productivity of corporate decision-makers by creating an integrated database of consistent, subject-oriented, historical data. It integrates data from multiple incompatible systems into a form that provides one consistent view of the organization. By transforming data into meaningful information, a data warehouse allows business managers to perform more substantive, accurate, and consistent analysis. • More cost-effective decision-making - Data warehousing helps to reduce the overall cost of the· product· by reducing the number of channels. • Better enterprise intelligence - It helps to provide better enterprise intelligence. • Enhanced customer service. • It is used to enhance customer" service. Problems of Data Warehousing The problems associated with developing and managing a data warehousing are as follows: Underestimation of resources of data loading - Sometimes we underestimate the time required to extract, clean, and load the data into the warehouse. It may take the significant proportion of the total development time, although some tools are there which are used to reduce the time and effort spent on this process. Hidden problems with source systems - Sometimes hidden .problems associated with the source systems feeding the data warehouse may be identified after years of being undetected. For example, when entering the details of a new 15
  • 16. property, certain fields may allow nulls which may result in staff entering incomplete property data, even when available and applicable. Required data not captured - In some cases the required data is not captured by the source systems which may be very important for the data warehouse purpose. For example the date of registration for the property may be not used in source system but it may be very important analysis purpose. Increased end-user demands - After satisfying some of end-users queries, requests for support from staff may increase rather than decrease. This is caused by an increasing awareness of the users on the capabilities and value of the data warehouse. Another reason for increasing demands is that once a data warehouse is online, it is often the case that the number of users and queries increase together with requests for answers to more and more complex queries. Data homogenization - The concept of data warehouse deals with similarity of data formats between different data sources. Thus, results in to lose of some important value of the data. High demand for resources - The data warehouse requires large amounts of data. Data ownership - Data warehousing may change the attitude of end-users to the ownership of data. Sensitive data that owned by one department has to be loaded in data warehouse for decision making purpose. But some time it results in to reluctance of that department because it may hesitate to share it with others. High maintenance - Data warehouses are high maintenance systems. Any reorganization· of the business processes and the source systems may affect the data warehouse and it results high maintenance cost. Long-duration projects - The building of a warehouse can take up to three years, which is why some organizations are reluctant in investigating in to data warehouse. Some only the historical data of a particular department is captured in the data warehouse resulting data marts. Data marts support only the requirements of a particular department and limited the functionality to that department or area only. Complexity of integration - The most important area for the management of a data warehouse is the integration capabilities. An organization must spend a significant amount of time determining how well the various different data warehousing tools can be integrated into the overall solution that is needed. This can be a very difficult task, as there are a number of tools for every operation of the data warehouse. Data mining A process used by companies to turn raw data into useful information. By using software to look for patterns in large batches of data, businesses can learn more about their customers and develop more effective marketing strategies as well as increase sales and decrease costs. Data mining depends on effective data collection and warehousing as well as computer processing. Grocery stores are well -known users of data mining techniques. Many supermarkets offer free loyalty cards to customers that give them access to reduced prices not available to non-members. The cards make it easy for stores to track who is buying what, when they are buying it, and at what price. The stores can then use this data, after analyzing it, for multiple purposes, such as offering customers coupons that are targeted to their buying habits and deciding when to put items on sale and when to sell them at full price. Data mining can be a cause for concern when only selected information, which is not representative of the overall sample group, is used to prove a certain hypothesis. Data mining process Cross-Industry Standard Process for Data Mining (CRISP-DM) consists of six phases intended as a cyclical process as the following figure: 16
  • 17. 17 Cross-Industry Standard Process for Data Mining (CRISP-DM) Business understanding In the business understanding phase: First, it is required to understand business objectives clearly and find out what are the business’s needs. Next, we have to assess the current situation by finding about the resources, assumptions, constraints and other important factors which should be considered. Then, from the business objectives and current situations, we need to create data mining goals to achieve the business objectives within the current situation. Finally, a good data mining plan has to be established to achieve both business and data mining goals. The plan should be as detailed as possible. Data understanding First, the data understanding phase starts with initial data collection, which we collect from available data sources, to help us get familiar with the data. Some important activities must be performed including data load and data integration in order to make the data collection successfully. Next, the “gross” or “surface” properties of acquired data needs to be examined carefully and reported. Then, the data needs to be explored by tackling the data mining questions, which can be addressed using querying, reporting and visualization. Finally, the data quality must be examined by answering some important questions such as “Is the acquired data complete?”, “Is there any missing values in the acquired data?” Data preparation The data preparation typically consumes about 90% of the time of the project. The outcome of the data preparation phase is the final data set. Once available data sources are identified, they need to be selected, cleaned, constructed and formatted into the desired form. The data exploration task at a greater depth may be carried during this phase to notice the patterns based on business understanding. Modeling First, modeling techniques have to be selected to be used for the prepared dataset. Next, the test scenario must be generated to validate the quality and validity of the model. Then, one or more models are created by running the modeling tool on the prepared dataset. Finally, models need to be assessed carefully involving stakeholders to make sure that created models are met business initiatives. Evaluation
  • 18. In the evaluation phase, the model results must be evaluated in the context of business objectives in the first phase. In this phase, new business requirements may be raised due to the new patterns that has been discovered in the model results or from other factors. Gaining business understanding is an iterative process in data mining. The go or no -go decision must be made in this step to move to the deployment phase. Deployment The knowledge or information, which we gain through data mining process, needs to be presented in such a way that stakeholders can use it when they want it. Based on the business requirements, the deployment phase could be as simple as creating a report or as complex as a repeatable data mining process across the organization. In the deployment phase, the plans for deployment, maintenance and monitoring have to be created for implementation and also future supports. From the project point of view, the final report of the project needs to summary the project experiences and review the project to see what need to improved created learned lessons. Data mining techniques Association Association (or relation) is probably the better known and most familiar and straightforward data mining technique. Here, you make a simple correlation between two or more items, often of the same type to identify patterns. For example, when tracking people's buying habits, you might identify that a customer always buys cream when they buy strawberries, and therefore suggest that the next time that they buy strawberries they might also want to buy cream. Classification You can use classification to build up an idea of the type of customer, item, or object by describing multiple attributes to identify a particular class. For example, you can easily classify cars into different types (sedan, 4x4, convertible) by identifying different attributes (number of seats, car shape, driven wheels). Given a new car, you might apply it into a particular class by comparing the attributes with our known definition. You can apply the same principles to customers, for example by classifying them by age and social group. Clustering By examining one or more attributes or classes, you can group individual pieces of data together to form a structure opinion. At a simple level, clustering is using one or more attributes as your basis for identifying a cluster of correlating results. Clustering is useful to identify different information because it correlates with other examples so you can see where the similarities and ranges agree. Clustering can work both ways. You can assume that there is a cluster at certain point and then use our identification criteria to see if you are correct. In this example, a sample of sales data compares the age of the customer to the size of the sale. It is not unreasonable to expect that people in their twenties (before marriage and kids), fifties, and sixties (when the children have left home), have more disposable income. Prediction Prediction is a wide topic and runs from predicting the failure of components or machinery, to identifying fraud and even the prediction of company profits. Used in combination with the other data mining techniques, prediction involves analyzing trends, classification, pattern matching, and relation. By analyzing past events or instances, you can make a prediction about an event. Using the credit card authorization, for example, you might combine decision tree analysis of individual past transactions with classification and historical pattern matches to identify whether a transaction is fraudulent. Making a match between the purchase of flights to the US and transactions in the US, it is likely that the transaction is valid. Sequential patterns Often used over longer-term data, sequential patterns are a useful method for identifying trends, or regular occurrences of similar events. For example, with customer data you can identify that customers buy a particular collection of products together at different times of the year. In a shopping basket application, you can use this information to automatically suggest that certain items be added to a basket based on their frequency and past purchasing history. Decision trees 18
  • 19. Related to most of the other techniques (primarily classification and prediction), the decision tree can be used either as a part of the selection criteria, or to support the use and selection of specific data within the overall structure. Within the decision tree, you start with a simple question that has two (or sometimes more) answers. Each answer leads to a further question to help classify or identify the data so that it can be categorized, or so that a prediction can be made based on each answer. Decision tree Combinations In practice, it's very rare that you would use one of these exclusively. Classification and clustering are similar techniques. By using clustering to identify nearest neighbors, you can further refine your classifications. Often, we use decision trees to help build and identify classifications that we can track for a longer period to identify sequences and patterns. Long-term (memory) processing Within all of the core methods, there is often reason to record and learn from the information. In some techniques, it is entirely obvious. For example, with sequential patterns and predictive learning you look back at data from multiple sources and instances of information to build a pattern. In others, the process might be more explicit. Decision trees are rarely built one time and are never forgotten. As new information, events, and data points are identified, it might be necessary to build more branches, or even entirely new trees, to cope with the additional information. You can automate some of this process. For example, building a predictive model for identifying credit card fraud is about building probabilities that you can use for the current transaction, and then updating that model with the new (approved) transaction. This information is then recorded so that the decision can be made quickly the next time. Ecommerce & Web application security Issues 1. Introduction E-commerce is defined as the buying and selling of products or services over electronic systems such as the Internet and to a lesser extent, other computer networks. It is generally regarded as the sales and commercial function of eBusiness. There has been a massive increase in the level of trade conducted electronically since the widespread penetration of the Internet. A wide variety of commerce is conducted via eCommerce, including electronic funds transfer, supply chain management, Internet marketing, online transaction processing, electronic data interchange (EDI), inventory management systems, and automated data collection systems. US online retail sales reached $175 billion in 2007 and are projected to grow to $335 billion by 2012 (Mulpuru, 2008). This massive increase in the uptake of eCommerce has led to a new generation of associated security threats, but any eCommerce system must meet four integral requirements: a) privacy – information exchanged must be kept from unauthorized parties 19
  • 20. b) integrity – the exchanged information must not be altered or tampered with c) authentication – both sender and recipient must prove their identities to each other and d) non-repudiation – proof is required that the exchanged information was indeed received (Holcombe, 2007). These basic maxims of eCommerce are fundamental to the conduct of secure business onli ne. Further to the fundamental maxims of eCommerce above, eCommerce providers must also protect against a number of different external security threats, most notably Denial of Service (DoS). These are where an attempt is made to make a computer resource unavailable to its intended users through a variety of mechanisms discussed below. The financial services sector still bears the brunt of e-crime, accounting for 72% of all attacks. But the sector that experienced the greatest increase in the number of attacks was eCommerce. Attacks in this sector have risen by 15% from 2006 to 2007 (Symantec, 2007). 2. Privacy Privacy has become a major concern for consumers with the rise of identity theft and impersonation, and any concern for consumers must be treated as a major concern for eCommerce providers. According to Consumer Reports Money Adviser (Perrotta, 2008), the US Attorney General has announced multiple indictments relating to a massive international security breach involving nine major retailers and more than 40 million credit- and debit-card numbers. US attorneys think that this may be the largest hacking and identity-theft case ever prosecuted by the justice department. Both EU and US legislation at both the federal and state levels mandates certain organizations to inform customers about information uses and disclosures. Such disclosures are typically accomplished through privacy policies, both online and offline (Vail et al., 2008). In a study by Lauer and Deng (2008), a model is presented linking privacy policy, through trustworthiness, to online trust, and then to customers’ loyalty and their willingness to provide truthful information. The model was tested using a sample of 269 responses. The findings suggested that consumers’ trust in a company is close ly linked with the perception of the company’s respect for customer privacy (Lauer and Deng, 2007). Trust in turn is linked to increased customer loyalty that can be manifested through increased purchases, openness to trying new products, and willingness to participate in programs that use additional personal information. Privacy now forms an integral part of any e - commerce strategy and investment in privacy protection has been shown to increase consumer’s spend, trustworthiness and loyalty. The converse of this can be shown to be true when things go wrong. In March 2008, the Irish online jobs board, jobs.ie, was compromised by criminals and users’ personal data (in the form of CV’s) were taken (Ryan, 2008). Looking at the real-time responses of users to this event on the popular Irish forum, Boards.ie, we can see that privacy is of major concern to users and in the event of their privacy being compromised users become very agitated and there is an overall negative effect on trust in e-commerce. User comments in the forum included: “I’m well p*ssed off about them keeping my CV on the sly”; “I am just angry that this could have happened and to so many people”; “Mine was taken too. How do I terminate my acc with jobs.ie”; “Grr, so annoyed, feel I should report i t to the Gardai now” (Boards.ie, 2008). 3. Integrity, Authentication & Non-Repudiation In any e-commence system the factors of data integrity, customer & client authentication and non-repudiation are critical to the success of any online business. Data integrity is the assurance that data transmitted is consistent and correct, that is, it has not been tampered or altered in any way during transmission. Authentication is a means by which both parties in an online transaction can be confident that they are who they say they are and non-repudiation is the idea that no party can dispute that an actual event online took place. Proof of data integrity is typically the easiest of these factors to successfully accomplish. A data hash or checksum, such as MD5 or CRC, is usually sufficient to establish that the likelihood of data being undetectably changed is extremely low (Schlaeger and Pernul, 2005). Notwithstanding these security measures, it is still possible to compromise data in transit through techniques such as phishing or man-in-the- middle attacks (Desmedt, 2005). These flaws have led to the need for the development of strong verification and security measurements such as digital signatures and public key infrastructures (PKI). One of the key developments in e-commerce security and one which has led to the widespread growth of e-commerce is the introduction of digital signatures as a means of verification of data integrity and authentication. In 1995, Utah became the first jurisdiction in the world to enact an electronic signature law. An electronic signature may be defined as 20
  • 21. “any letters, characters, or symbols manifested by electronic or similar means and executed or adopted by a party with the intent to authenticate a writing” (Blythe, 2006). In order for a digital signature to attain the same legal status as an ink-on-paper signature, asymmetric key cryptology must have been employed in its production (Blythe, 2006). Such a system employs double keys; one key is used to encrypt the message by the sender, and a different, albeit mathematically related, key is used by the recipient to decrypt the message (Antoniou et al., 2008). This is a very good system for electronic transactions, since two stranger-parties, perhaps living far apart, can confirm each other’s identity and thereby reduce the likelihood of fraud in the transaction. Non-repudiation techniques prevent the sender of a message from subsequently denying that they sent the message. Digital Signatures using public-key cryptography and hash functions are the generally accepted means of providing non-repudiation of communications 4. Technical Attacks Technical attacks are one of the most challenging types of security compromise an e -commerce provider must face. Perpetrators of technical attacks, and in particular Denial-of-Service attacks, typically target sites or services hosted on high-profile web servers such as banks, credit card payment gateways, large online retailers and popular social networking sites. Denial of Service Attacks Denial of Service (DoS) attacks consist of overwhelming a server, a network or a website in order to paralyze its normal activity (Lejeune, 2002). Defending against DoS attacks is one of the most challenging security problems on the Internet today. A major difficulty in thwarting these attacks is to trace the source of the attack, as they often use incorrect or spoofed IP source addresses to disguise the true origin of the attack (Kim and Kim, 2006). The United States Computer Emergency Readiness Team defines symptoms of deni al-of-service attacks to include (McDowell, 2007): • Unusually slow network performance • Unavailability of a particular web site • Inability to access any web site • Dramatic increase in the number of spam emails received DoS attacks can be executed in a number of different ways including: ICMP Flood (Smurf Attack) – where perpetrators will send large numbers of IP packets with the source address faked to appear to be the address of the victim. The network’s bandwidth is quick ly used up, preventing legitimate packets from getting through to their destination Teardrop Attack – A Teardrop attack involves sending mangled IP fragments with overlapping, over-sized, payloads to the target machine. A bug in the TCP/IP fragmentation re-assembly code of various operating systems causes the fragments to be improperly handled, crashing them as a result of this. Phlashing – Also known as a Permanent denial-of-service (PDoS) is an attack that damages a system so badly that it requires replacement or reinstallation of hardware. Perpetrators exploit security flaws in the remote management interfaces of the victim’s hardware, be it routers, printers, or other networking hardware. These flaws leave the door open for an attacker to remotely ‘update’ the device firmware to a modified, corrupt or defective firmware image, therefore bricking the device and making it permanently unusable for its original purpose. Distributed Denial-of-Service Attacks - Distributed Denial of Service (DDoS) attacks are one of the greatest security fear for IT managers. In a matter of minutes, thousands of vulnerable computers can flood the victim website by choking legitimate traffic (Tariq et al., 2006). A distributed denial of service attack (DDoS) occurs when multiple compromised systems flood the bandwidth or resources of a targeted system, usually one or more web servers. The most famous DDoS attacks occurred in February 2000 where websites including Yahoo, Buy.com, eBay, Amazon and CNN were attacked and left unreachable for several hours each (Todd, 2000). Brute Force Attacks – A brute force attack is a method of defeating a cryptographic scheme by trying a large number of possibilities; for example, a large number of the possible keys in a key space in order to decrypt a message. Brute Force Attacks, although perceived to be low-tech in nature are not a thing of the past. In May 2007 the internet infrastructure in Estonia was crippled by multiple sustained brute force attacks against government and commercial institut ions in the 21
  • 22. country (Sausner, 2008). The attacks followed the relocation of a Soviet World War II memorial in Tallinn in late April made news around the world. 5. Non-Technical Attacks Phishing Attacks Phishing is the criminally fraudulent process of attempting to acquire sensitive information such as usernames, passwords and credit card details, by masquerading as a trustworthy entity in an electronic communication. Phishing scams generally are carried out by emailing the victim with a ‘fraudulent’ email from what purports to be a legitimate organization requesting sensitive information. When the victim follows the link embedded within the email they are brought to an elaborate and sophisticated duplicate of the legitimate organizations website. Phishi ng attacks generally target bank customers, online auction sites (such as eBay), online retailers (such as amazon) and services providers (such as PayPal). According to community banker (Swann, 2008), in more recent times cybercriminals have got more sophisticated in the timing of their attacks with them posing as charities in times of natural disaster. Social Engineering Social engineering is the art of manipulating people into performing actions or divulging confidential information. Social engineering techniques include pretexting (where the fraudster creates an invented scenario to get the victim to divulge information), Interactive voice recording (IVR) or phone phishing (where the fraudster gets the victim to divulge sensitive information over the phone) and baiting with Trojans horses (where the fraudster ‘baits’ the victim to load malware unto a system). Social engineering has become a serious threat to e-commerce security since it is difficult to detect and to combat as it involves ‘human’ factors which cannot be patched akin to hardware or software, albeit staff training and education can somewhat thwart the attack (Hasle et al., 2005). 6. Conclusions In conclusion the e-commerce industry faces a challenging future in terms of the security risks it must avert. With increasing technical knowledge, and its widespread availability on the internet, criminals are becoming more and more sophisticated in the deceptions and attacks they can perform. Novel attack strategies and vulnerabilities only really become known once a perpetrator has uncovered and exploited them. In saying this, there are multiple security strategies which any e-commerce provider can instigate to reduce the risk of attack and compromise significantly. Awareness of the risks and the implementation of multi-layered security protocols, detailed and open privacy policies and strong authentication and encryption measures will go a long way to assure the consumer and insure the risk of compromise is kept minimal. What is MySQL? 22  MySQL is a database system used on the web  MySQL is a database system that runs on a server  MySQL is ideal for both small and large applications  MySQL is very fast, reliable, and easy to use  MySQL supports standard SQL  MySQL compiles on a number of platforms  MySQL is free to download and use  MySQL is developed, distributed, and supported by Oracle Corporation The data in MySQL is stored in tables. A table is a collection of related data, and it consists of columns and rows. Databases are useful when storing information categorically. A company may have a database with the following tables:  Employees  Products  Customers  Orders What is PHP?  PHP is an acronym for "PHP Hypertext Preprocessor"  PHP is a widely-used, open source scripting language
  • 23. 23  PHP scripts are executed on the server  PHP costs nothing, it is free to download and use What is a PHP File?  PHP files can contain text, HTML, CSS, JavaScript, and PHP code  PHP code are executed on the server, and the result is returned to the browser as plain HTML  PHP files have extension ".php" What Can PHP Do?  PHP can generate dynamic page content  PHP can create, open, read, write, delete, and close files on the server  PHP can collect form data  PHP can send and receive cookies  PHP can add, delete, modify data in your database  PHP can restrict users to access some pages on your website  PHP can encrypt data With PHP you are not limited to output HTML. You can output images, PDF files, and even Flash movies. You can also output any text, such as XHTML and XML. Connecting to and Disconnecting from the Server To connect to the server, you will usually need to provide a MySQL user name when you invoke MySQL and, most likely, a password. If the server runs on a machine other than the one where you log in, you will also need to specify a host name. Contact your administrator to find out what connection parameters you should use to connect (that is, what host, user name, and password to use). Once you know the proper parameters, you should be able to connect like this: shell> mysql -h host -u user -p Enter password: ******** host and user represent the host name where your MySQL server is running and the user name of your MySQL account. Substitute appropriate values for your setup. The ******** represents your password; enter it when MySQL displays the Enter password: prompt. If that works, you should see some introductory information followed by a mysql> prompt: shell> mysql -h host -u user -p Enter password: ******** Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 25338 to server version: 5.0.96-standard Type 'help;' or 'h' for help. Type 'c' to clear the buffer. mysql> The mysql> prompt tells you that mysql is ready for you to enter commands. If you are logging in on the same machine that MySQL is running on, you can omit the host, and simply use the following: shell> mysql -u user -p If, when you attempt to log in, you get an error message such as ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2), it means that the MySQL server daemon (Unix) or service (Windows) is not running. Consult the administrator that is appropriate to your operating system. Some MySQL installations permit users to connect as the anonymous (unnamed) user to the server running on the local host. If this is the case on your machine, you should be able to connect to that server by invoking mysql without any options:
  • 24. 24 shell> mysql After you have connected successfully, you can disconnect any time by typing QUIT (or q) at the mysql> prompt: mysql> QUIT Bye On Unix, you can also disconnect by pressing Control+D. Most examples in the following sections assume that you are connected to the server. They indicate this by the mysql> prompt. Data type In computer science and computer programming, a data type or simply type is a classification identifying one of various types of data, such as real, integer or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored. In MySQL there are three main types : text, number, and Date/Time types. Refer mysql book (Aplus). The Java programming language is statically-typed, which means that all variables must first be declared before they can be used. All programs involve storing and manipulating data. Luckily (???) the computer only knows about a few types of data. These include, numbers, true/false values, characters (a,b,c,1,2,3,etc), lists of data, and complex "Structures" of data, which build up new data types by combining the other data types. Creating & Using Database , getting information about database and table– refer mysql book. Batch mode - To run your SQL batch file from the command line, enter the following: In Windows: mysql < c:commands.sql Don’t forget to enc lose the file path in quotes if there are any spac es. Running the Batch Job as a Scheduled Task In Windows Batch jobs can be even more automated by running them as a scheduled task. In Windows, batch files are used to execute DOS commands. We can schedule our batch job by placing the command code that we entered earlier in a file, suc h as “runsql.bat”. This file will contain only one line: mysql < c:commands.sql To schedule the batch job: 1. Open Scheduled Tasks.  Click Start, click All Programs, point to Accessories, point to System Tools, and then click Scheduled Tasks:
  • 25. 2. Double-click Add Scheduled Task to start the Scheduled Task Wizard, and then click Next in the first dialog box. 25 3. The next dialog box displays a list of programs that are installed on your computer, either as part of the Windows operating system, or as a result of software installation. Click Browse and select your SQL file, and then click Open. 4. Type a name for the task, and then choose when and how often you want the task to run, from one of the following options:  Daily  Weekly  Monthly  One time only  When my computer starts (before a user logs on)  When I log on (only after the current user logs on) 5. Click Next, specify the information about the day and time to run the task, and then click Next. 6. OPTIONAL: Enter the name and password of the user who is associated with this task. Make sure that you choose a user with sufficient permissions to run the program. By default, the wizard selects the name of the user who is currently logged on. Scheduled Tasks in Windows If at a later time you’d like to suspend this task, you c an open it via the Scheduled Tasks dialog (pictured above) and deselec t the Enabled c hec kbox on the “Task” tab:
  • 26. 26 The “Task” Tab Containing the “Enabled” Checkbox Similarly, you can remove the task by deleting it like any file. In fact, the task is saved as a .job f ile in the WINNTTasks folder. Mysql in Cloud A database accessible to clients from the cloud and delivered to users on demand via the Internet from a cloud database provider's servers. Also referred to as Database-as-a-Service (DBaaS), cloud databases can use cloud computing to achieve optimized scaling, high availability, multi-tenancy and effective resource allocation. While a cloud database can be a traditional database such as a MySQL or SQL Server database that has been adopted for cloud use, a native cloud database such as Xeround's MySQL Cloud database tends to better equipped to optimally use cloud resources and to guarantee scalability as well as availability and stability. Cloud databases can offer significant advantages over their traditional counterparts, including increased accessibility, automatic failover and fast automated recovery from failures, automated on-the-go scaling, minimal investment and maintenance of in-house hardware, and potentially better performance. At the same time, cloud databases have their share of potential drawbacks, including security and privacy issues as well as the potential loss of or inability to access critical data in the event of a disaster or bankruptcy of the cloud database service provider.