Files and data storage

Files and Data Storage
● Most computers are used for data
processing, as a big growth area in the
“information age”
● Data processing from a computer science
perspective:
– Storage of data
– Organization of data
– Access to data
– Processing of data

Data Structures vs File Structures
• Both involve:
– Representation of Data
+
– Operations for accessing data
• Difference:
– Data structures: deal with data in the
main memory
– File structures: deal with the data in the
secondary storage

File Structure in Computer Science

Goal of the File Structures
● Minimize the number of trips to the
secondary storage (SS) in order to get
desired information.
● Group related information so that we are
likely to get everything we new with fewer
trip to the SS.
● Select the right file structures so that
performance can be increased.

File and File Operations
● A file is a collection of data stored on mass
storage like hard disk, CD etc.
● File data consist of records (student
information) and each record contains
number of fields (ID, Name etc.).
● We can perform following operations on a
file.
– Search for a particular data in a file.
– Add a certain data item.
– Remove / Update a certain item.

File and File operations
– Order the data items according to a
certain criterion, merge of files.
– Creation of new files from existing files.
– Finally create, open, and close
operations which have implications in the
operating system.

Organization of Files
● Sequential
● Indexed
● Hashing

Sequential File Organization
● Records are conceptually organized in a
sequential list and can only be accessed
sequentially.
● The actual storage might or might not be
sequential (on tape or on disk)

Sequential File (Write/Read in C++)
● Create ofstream object (after including file
fstream.h at the top)
● Open file for output or for appending at the
end of file.
● Test whether the file open operation of step
2 is successful. If not successful then exit
else continue.
● Write / Read data to output file.
● Close file after writing / reading data.

Sequential File Implementation
● #include <iostream.h>
● #include <fstream.h>
● #include <stdlib.h>
●
● Void main() {
● Int i, Roll[N] = { 171,717, 834, 394, 475 };
● float Percentage[N]= {45.3, 84.5, 95.0,
48.2, 39.2 };
● Char* Name[N] = {“wajid”, “Aashir”,
“Luqman”, “Tushar”, “Waseem” };
● // Step1: Create ofstream and ifstream
objects
● Ofstream outFile; ifstream inFile;
● // Step 2: Open file for output
● outFile.open(“percent.dat”, ios::out);
● // Step 3 Test weather open operation is
successful
● If (!outFile) {
– cout<<”File could not open “;
– Exit(1);
● Else
– Cout<<”n File open successfullyn”;
● //Step 4: Write to file
● For( i=0; i<N; i++)
– OutFile <<Name[i]<<' '<<Roll[i]<< ' '
<<Percentage[i]<<endl;
● cout<<”n File write successfully. n”;
● // Step 5: Close file
● outFile.close();

● // Step 6 open file for input
● inFile.open(“percentage.dat”, ios::in);
● // Step 7: Test wether file open successfully.
● if(!inFile) {
– cout<<”File could not open”<<endl;
– Exit(1);
● }
● // Step 8: Read from input File
● While( inFile>> Name >>Roll >> Percentage)
– cout<<setiosflags(ios::left)<<setw(14)<<roll <<setw(16)<<Name <<setw(9)<<Setprecision(4)
– <<setiosflags(ios::showpoint | ios:: right)
– <<percent<<'%'<< endl;
● //Step 9: Close file
● inFile.close(); }

● OUTPUT:
Roll Number Name Percentage
171 Wajid 45.30%
717 Aashir 84.50%
834 Luqman 95.00%
394 Tushar 48.20%
475 Waseem 39.20%

Indexed File Organization
● Sequential search is even slower on
disk/tape than in main memory. Try to
improve performance using more
sophisticated data structures.
● An index for a file is a list of key field values
occurring in the file along with the address
of the corresponding record in the mass
storage.
● Typically the key field is much smaller than
the entire record, so the index will fit in
main memory.

Indexed File Organization ...
● The index can be organized as a list, a
search tree, a hash table, etc. To find a
particular record:
● Search the index for the desired key.
● When the search returns the index entry,
extract the record’s address on mass
storage.
● Access the mass storage at the given
address to get the desired record.

Hashed File Organization
● A hashed file uses a hash function to map
the key to the address.
● Eliminates the need for an extra file (index).
● There is no need for an index and all of the
overhead associated with it.

● Use an array of M < N linked lists, good
choice is M~ N/10
● Hash: map key to integer i between 0 and
M-1.
● Insert: put at front of ith chain (if not already
there).
● Search: only need to search ith chain.
Collusion Resolution (Separate Chaining)

Collusion Resolution (Separate Chaining)

Separate Chaining Implementation

● Use an array of size M >> N, good choice
M~2N
● Hash: map key to integer i between 0 and
M-1.
● Insert: put in slot i if free; if not try i+1, i+2,
etc.
● Search: search slot i; if occupied but no
match, try i+1, i+2, etc.
Collusion Resolution (Open Addressing)

Collusion Resolution (Open addressing)

Open Addressing Implementation

Files and data storage

More Related Content

What's hot

Viewers also liked

Similar to Files and data storage

More from Zaid Shabbir

Recently uploaded

Files and data storage