Data Processing in Fundamentals of IT

Prepared by,
V.Santhi,
Assistant Professor
Department of Computer Applications
Bon Secours College for Women
Thanjavur

Introduction
Data versus Information
File Processing
Data Processing

Data processing:
Data processing is the conversion of data into usable
and desired form.
This conversion or “processing” is carried out using a
predefined sequence of operations either manually or
automatically.
 Most of the data processing is done by using
computers and thus done automatically.
 The output or “processed” data can be obtained in
different forms like image, graph, table, vector file,
audio, charts or any other desired format depending on
the software or method of data processing used.

Data versus information
The collection of data, which plays a significant role
in the statistical analysis. We quite commonly use the
term ‘data’ in the different context.
However, in general, it indicates the facts or statistics
gathered by the researcher for analysis in their
original form.
When the data is processed and transformed in such
a way that it becomes useful to the users, it is known
as ‘information’

File processing
Computer data is processed in two fundamentals
ways: file processing and database processing. With
file processing, data is stored and processed in
separate files. There are two types of file processing:
 Sequential file processing
 Direct access file processing

Sequential file processing
 Sequential file processing stores and accesses records
in sequence.
 Such processing can be accomplished either by using
tape storage or disk storage.
 To perform file processing, records are sorted before
they are processed.
 Sequential file processing is used in situations where
data can be processed in batches and where a
substantial portion of the master file is changed with
the processing of each batch.

Direct access file
 It is impractical to process the data sequentially, direct
access file processing is required.
 It is another word is called random access file
processing.
 There are many ways of organizing a file for direct
access .
 First, the file must be stored on a direct access device
like a disk, so that the records need not be processed in
sequence.
 Second, some means must be developed for
determining the location of a particular records.

Problems of file processing
Program-Data Dependence.
Duplication of Data.
Limited data sharing.
Lengthy Development Times.
Excessive Program Maintenance.

 3. Limited data sharing. Each application has its
own private files with little opportunity to share
data outside their own applications. A requested
report may require data from several incompatible
files in separate systems.
 4. Lengthy Development Times. There is little
opportunity to leverage previous development
efforts. Each new application requires the
developer to start from scratch by designing new
file formats and descriptions
 5. Excessive Program Maintenance. The preceding
factors create a heavy program maintenance load.

Database processing
 Data processing refers to the process of performing
specific operations on a set of data or a database.
 Data processing primarily is performed on
information systems, a broad concept that
encompasses computer systems and related devices.
 At its core, an information system consists of
input, processing, and output.

History of information
Quality of information
Database
Why database
Characteristics of database
Database management system
Types of database management system

History of information
 The history of print and written culture, including relatively
long-established areas such as the histories of libraries and
librarianship, book history, publishing history, and the history of
reading.
 The history of more recent information disciplines and practice,
that is to say, the history of information management,
information systems, and information science.
 The history of contiguous areas, such as the history of the
information society and information infrastructure, necessarily
enveloping communication history (including
telecommunications history) and the history of information
policy.
 The history of information as social history, with emphasis on
the importance of informal information networks."

Quality of information
Quality of information is an important concept. Information
quality is a multi-attribute concept. If the attributes that define
quality of information are of good quality or of high value then the
information is said to have good quality. The attributes of quality
of information are:
 Timeliness- The speed at which the information is received.
Normally, faster the information better is its quality.
 Reliability - the reliability of information is a key attribute of
quality. Only if the information is reliable is it of any use. The
understanding of reliability comes from past experience, the
standing/reliability of the source, the methodology adopted to
acquire and process the information and the channel of delivery.
 Accuracy - is the correctness of the information. Normally, the
higher the accuracy of the information, the better is its quality.

Database
 Database is a collection of inter-related data which
helps in efficient retrieval, insertion and deletion of
data from database and organizes the data in the form
of tables, views, schemas, reports etc.
 For Example, university database organizes the data
about students, faculty, and admin staff etc. which
helps in efficient retrieval, insertion and deletion of
data from it.

Why database
 Redundancy of data: Data is said to be redundant if same data is
copied at many places. If a student wants to change Phone number, he
has to get it updated at various sections. Similarly, old records must be
deleted from all sections representing that student.
 Inconsistency of Data: Data is said to be inconsistent if multiple copies
of same data does not match with each other. If Phone number is
different in Accounts Section and Academics Section, it will be
inconsistent. Inconsistency may be because of typing errors or not
updating all copies of same data.
 Difficult Data Access: A user should know the exact location of file to
access data, so the process is very cumbersome and tedious. If user wants
to search student hostel allotment number of a student from 10000
unsorted students’ records, how difficult it can be.
 Unauthorized Access: File System may lead to unauthorized access to
data. If a student gets access to file having his marks, he can change it in
unauthorized way.
 No Concurrent Access: The access of same data by multiple users at
same time is known as concurrency. File system does not allow
concurrency as data can be accessed by only one user at a time.
 No Backup and Recovery: File system does not incorporate any backup
and recovery of data if a file is lost or corrupted.

Characteristics of data in database
 Shared: data in database and shared among different users and
applications
 Persistence: data in database exist permanently in the sense, the be
data can live beyond the scope of the process that created it.
 Integrity: data should be correct with respect to the real world entity
that they represent.
 Security: data should be protected from unauthorized access.
 Consistency: whenever more than one data element in a database
represents related real world values the values should be consistent
with respect to the relationship.
 Non redundancy: no two data items in a database should represent
the same real world entity.
 Independence: the three levels in the schema should be independent
of each other so that the changes in the schema at one level should not
affect the other levels.

Database management system
 The software which is used to manage database is
called Database Management System (DBMS). For
Example, MySQL, Oracle etc. are popular commercial
DBMS used in different applications. DBMS allows
users the following tasks:
 Data Definition: It helps in creation, modification
and removal of definitions that define the
organization of data in database.
 Data Updating: It helps in insertion, modification
and deletion of the actual data in the database.

Types of Database Management
Systems
There are four structural types of database management
systems:
 Hierarchical databases
 Network databases
 Relational databases
 Object-oriented databases
 Deductive databases

Hierarchical database
 This database model organises data into a tree-like-structure, with a single
root, to which all the other data is linked. The hierarchy starts from
the Root data, and expands like a tree, adding child nodes to the parent
nodes.
 In this model, a child node will only have a single parent node.
 This model efficiently describes many real-world relationships like index
of a book, recipes etc.
 In hierarchical model, data is organised into tree-like structure with one
one-to-many relationship between two different types of data, for
example, one department can have many courses, many professors and of-
course many students.

Network databases
 This is an extension of the Hierarchical model. In this
model data is organised more like a graph, and are
allowed to have more than one parent node.
 In this database model data is more related as more
relationships are established in this database model.
Also, as the data is more related, hence accessing the
data is also easier and fast. This database model was
used to map many-to-many data relationships.

Relational model
 In this model, data is organised in two-
dimensional tables and the relationship is
maintained by storing a common field.
 This model was introduced by E.F Cod in 1970, and
since then it has been the most widely used database
model, infect, we can say the only database model
used around the world.
 The basic structure of data in the relational model is
tables. All the information related to a particular
type is stored in rows of that table.
 Hence, tables are also known as relations in
relational model.

Object-oriented databases
Object-oriented databases use small, reusable chunks of
software called objects. The objects themselves are stored in
the object-oriented database.
Each object consists of two elements:
 1) a piece of data (e.g., sound, video, text, or graphics), and
 2) the instructions, or software programs called methods,
for what to do with the data.
Part two of this definition requires a little more explanation
The instructions contained within the object are used to do
something with the data in the object. For example, test
scores would be within the object as would the instructions
for calculating average test score.

Deductive database
 A deductive database is a database system that can
make conclusions about its data based on a set of well-
defined rules and facts.
 This type of database was developed to combine logic
programming with relational database management
systems. Usually, the language used to define the rules
and facts is the logical programming language Data
log.

Database design
Data normalization
Keys
Relationships
First normal form
Second normal form
Third normal form

Database design
 Database Design is a collection of processes that facilitate
the designing, development, implementation and
maintenance of enterprise data management systems
 It helps produce database systems
 That meet the requirements of the users
 Have high performance.
Levels of database service:
 Physical level
 Conceptual level
 External level

Physical Level
Physical level describes the physical storage structure
of data in database.
 It is also known as Internal Level.
 This level is very close to physical storage of data.
 At lowest level, it is stored in the form of bits with the
physical addresses on the secondary storage device.
 At highest level, it can be viewed in the form of files.
 The internal schema defines the various stored data
types. It uses a physical data model.

Conceptual Level
 Conceptual level describes the structure of the whole
database for a group of users.
 It is also called as the data model.
 Conceptual schema is a representation of the entire
content of the database.
 These schema contains all the information to build
relevant external records.
 It hides the internal details of physical storage.

External Level
 External level is related to the data which is viewed by
individual end users.
 This level includes a no. of user views or external
schemas.
 This level is closest to the user.
 External view describes the segment of the database
that is required for a particular user group and hides
the rest of the database from that user group.


Keys
A KEY is a value used to identify a record in a table uniquely.
A KEY could be a single column or combination of multiple
columns.
Primary key:
 A primary is a single column value used to identify a database
record uniquely.
It has following attributes
 A primary key cannot be NULL
 A primary key value must be unique
 The primary key values cannot be changed
 The primary key must be given a value when a new record is
inserted.

Composite key:
 A composite key is a primary key composed of
multiple columns used to identify a record uniquely
 In our database, we have two people with the same
name Robert Phil, but they live in different places.
Foreign key:
 Foreign Key references the primary key of another
Table! It helps connect your Tables
 A foreign key can have a different name from its
primary key
 It ensures rows in one table have corresponding rows
in another
 Unlike the Primary key, they do not have to be unique.
Most often they aren't
 Foreign keys can be null even though primary keys can
not

Relationship
One-to-One Relationships
A pair of tables bears a one-to-
one relationship when a single
record in the first table is related
to only one record in the second
table, and a single record in the
second table is related to only
one record in the first table
One-to-Many Relationships
A one-to-many relationship
exists between a pair of tables
when a single record in the first
table can be related to one or
more records in the second table,
but a single record in the second
table can be related to only
one record in the first table. Let's
look at a gene

Many-to-Many Relationships
A pair of tables bears a many-to-many relationship
when a single record in the first table can be related
to one or more records in the second table and a
single record in the second table can be related to
one or more records in the first table.

Normalization
Normalization is the branch of relational theory that provides
design insights. It is the process of determining how much
redundancy exists in a table. The goals of normalization are to:
 Be able to characterize the level of redundancy in a relational
schema
 Provide mechanisms for transforming schemas in order to
remove redundancy
Normalization theory draws heavily on the theory of functional
dependencies. Normalization theory defines six normal forms
(NF). Each normal form involves a set of dependency properties
that a schema must satisfy and each normal form gives guarantees
about the presence and/or absence of update anomalies. This
means that higher normal forms have less redundancy, and as a
result, fewer update problems.

First normal form- elimination of
repeating groups
If a relation contain composite or multi-valued
attribute, it violates first normal form or a relation is in
first normal form if it does not contain any composite or
multi-valued attribute. A relation is in first normal form
if every attribute in that relation is singled valued
attribute.

Second normal form- elimination
of redundant data
 To be in second normal form, a relation must be in first normal form
and relation must not contain any partial dependency. A relation is in
2NF if it has No Partial Dependency, i.e., no non-prime attribute
(attributes which are not part of any candidate key) is dependent on
any proper subset of any candidate key of the table.
 Partial Dependency – If proper subset of candidate key determines
non-prime attribute, it is called partial dependency.
 Example 1 – In relation STUDENT_COURSE given in Table 3,FD set:
{COURSE_NO->COURSE_NAME} Candidate Key: {STUD_NO,
COURSE_NO} In FD COURSE_NO->COURSE_NAME, COURSE_NO
(proper subset of candidate key) is determining COURSE_NAME (non-
prime attribute). Hence, it is partial dependency and relation is not in
second normal form.

Third normal form – elimination of columns not
dependent on the key
 A relation is in third normal form, if there is no
transitive dependency for non-prime attributes
is it is in second normal form.
A relation is in 3NF if at least one of the
following condition holds in every non-trivial
function dependency X –> Y
 X is a super key.
 Y is a prime attribute (each element of Y is part of
some candidate key).

Data Processing in Fundamentals of IT

More Related Content

What's hot

Similar to Data Processing in Fundamentals of IT

More from SanthiNivas

Recently uploaded

Data Processing in Fundamentals of IT