SlideShare a Scribd company logo
1 of 80
Disk Storage, Basic File Structures,
and Hashing
Christalin Nelson | SOCS
1 of 80
At a Glance
• Categories of Computer Storage Media
• Memory Hierarchy
• Storage of Databases
• Disk Storage Devices
• Tape Storage Devices
• Buffering of Blocks
• Placing File Records on Disk
• File Operations
• Heap Files
• Sorted Files
• Hashing Techniques
• Parallelizing Disk Access using RAID Technology
7-Apr-24
Christalin Nelson | SOCS
2 of 80
7-Apr-24
Categories of Computer Storage Media
• Primary storage
– Storage media that can be operated directly by the CPU
– Volatile, fast access, less storage capacity, and expensive
– Example: Main memory, Cache memory
• Secondary & Tertiary storage
– Data cannot be processed directly by the CPU (copied into primary storage for
processing)
– Nonvolatile, Removable media, slower access, larger capacity, and less cost
– Example:
• Secondary: Hard disks
• Tertiary: Optical disks (CD-ROMs, DVDs), tapes
Christalin Nelson | SOCS
3 of 80
7-Apr-24
Memory Hierarchy (1/2)
• Primary Storage
– Cache Memory
• Static RAM (SRAM)
• Used by CPU to speed up execution of program instructions using techniques such as
prefetching and pipelining
• Most expensive
– Main memory
• Dynamic RAM (DRAM)
• Provides the main work area for the CPU for keeping program instructions and data
• Low cost, volatile, lower speed (vs. static RAM)
Christalin Nelson | SOCS
4 of 80
7-Apr-24
Memory Hierarchy (2/2)
• Secondary & Tertiary Storage
– Flash Memory (EEPROM)
• Used with cameras, MP3 players, cell phones, PDAs
– Magnetics Disks
– Mass Storage
• WORM (archival, more life than magnetic Disks)
• CDROM, DVD, Optical jukebox memories (loaded on demand)
– Magnetic Tapes (archival), Tape Jukebox (loaded on demand)
• Note
– Main Memory Databased for RT applications
Christalin Nelson | SOCS
5 of 80
7-Apr-24
Storage of Databases (1/3)
• Persistent data (vs. Transient data)
• Stored on Magnetic Disks
– DB is too large to fit entirely in Main memory
– Main memory is a volatile storage
– The cost of storage per unit of data is greater for primary storage
• Backed up on Tapes
– Cost is less than storage on disk & slow data access
– Offline vs. Online storage
• Tapes are offline devices – should be loaded for data to become available
• Disks are online devices – can be accessed directly at any time
Christalin Nelson | SOCS
6 of 80
7-Apr-24
Storage of Databases (2/3)
• DBMS has several options available for data organization
– Physical DB design involves the selection of a particular data organization technique
that best suits the given application requirements
– DB designers & DBA must know the pros & cons of each storage technique when they
design, implement, and operate a DB on a specific DBMS
Christalin Nelson | SOCS
7 of 80
7-Apr-24
Storage of Databases (3/3)
• File Organization
– Primary file organization
• Determine how the file records are physically stored and how they can be accessed
• Examples
– Heap file (or unordered file) - Appends new records at EOF
– Sorted file (or sequential file) - Records are ordered by the value of a particular field (called the
sort key)
– Hashed file - Uses a hash function applied to a particular field (called the hash key) to determine
a record’s placement on disk
– Other primary file organizations: B-trees
– Secondary organization (or) auxiliary access structure
• Allows efficient access to file records based on alternate fields (exist as indexes) than those
that have been used for the primary file organization
Christalin Nelson | SOCS
8 of 80
7-Apr-24
Disk Storage Devices (1/11)
• Preferred secondary storage device for high storage capacity and low-cost
• Disk is made of magnetic material shaped as a thin circular disk & protected by a
plastic or acrylic cover
– Data stored as magnetized areas on magnetic disk surfaces
• Basic unit: Bit (0 or 1)
– Disk capacity (No. of bytes stored on the Disk)
• It can be single or double-sided
• Disk packs
– Contains several magnetic disks connected to a rotating spindle
– Used with Servers
Christalin Nelson | SOCS
9 of 80
7-Apr-24
Disk Storage Devices (2/11)
• Track
– Concentric circles on a disk surface of small width, each having a distinct diameter
• No. of Tracks: few 100s to 1000s
• Capacity: 10s of kB to 150 kB
– Cylinder
• Tracks with the same diameter on various surfaces
• Data stored in one cylinder can be retrieved much faster than if it were distributed among
different cylinders
– Tracks are divided into smaller sectors
• Note: Not all disks have their tracks divided into sectors
Christalin Nelson | SOCS
10 of 80
7-Apr-24
Disk Storage Devices (3/11)
• Sectors
– Physical units of storage on the disk
– Sectors are hard-coded on the disk surface
• Fixed-size portion that can be R/W by the disk's R/W head
• Sector size: 512 B to 4 KB (Vary depending on the disk's formatting and specifications)
• ZBR (Zone Bit Recording)
– Allows a range of cylinders to have the same number of sectors per arc
– Example: Cylinders 0–99 may have 1 sector per track, 100–199 may have 2 sectors per track, etc.
– A disk with hard-coded sectors often has the sectors subdivided into blocks during
initialization
• Transfer of data between main memory and disk takes place in units of disk blocks
Christalin Nelson | SOCS
11 of 80
7-Apr-24
Disk Storage Devices (4/11)
• Blocks
– Logical units of storage managed by the file system
– The division of a track into equal-sized disk blocks (or pages) is set by the OS during
disk formatting (or initialization) and cannot be changed dynamically
• Blocks are typically larger than sectors and are used for organizing and managing files
• Disk block size: 512 to 8192 B
– Blocks are separated by fixed-size inter-block gaps
• Include specially coded control information written during disk initialization
– This information is used to determine which block on the track follows each inter-block gap
Christalin Nelson | SOCS
12 of 80
(a) Sectors subtending a fixed angle
(b) Sectors maintaining a uniform recording density
(a) A single-sided disk with read/write hardware
(b) A disk pack with read/write hardware
Rotational
speeds in rpm
7-Apr-24
Disk Storage Devices (7/11)
• Hardware address of a block
– Cylinder number + Track number + Block number
• Track Number: Surface no. within the cylinder on which the track is located
• Block number (within the track): Supplied to disk I/O hardware
– In Modern Disk drives
• Logical Block Address (LBA)
– A single number between 0 and n (If total disk capacity is n + 1 blocks)
– Disk drive controller automatically maps LBA to the right block
• Buffer address
– Contiguous reserved area in main storage (holds 1 disk block)
» Read command (Copy Disk block into Buffer), Write command (Copy buffer into Disk block)
– Cluster (several contiguous blocks) can also be transferred as a unit
» Buffer size = Number of bytes in Cluster
Christalin Nelson | SOCS
A disk is a Random
access addressable
device
15 of 80
7-Apr-24
Disk Storage Devices (8/11)
• Disk Drive Hardware mechanism (1/4)
– Disk or disk pack is mounted on disk drive
– Electrical Motor-1: Rotates disks continuously at a constant speed (5400 to 15000 rpm)
– Read/write head
• Movable-Head disks
– Mechanical Arms connect R/W heads to an actuator attached to Electrical motor-2
» Motor moves R/W heads in unison & positions precisely over a cylinder & block specified in
a block address  Activate electronic component attached to arm to transfer data
– Used in common
• Fixed-Head disks (No. of R/W Heads = No. of tracks)
– A track or cylinder is selected by electronically switching to the appropriate R/W head
– Faster, high cost (due to additional R/W heads)
• Disk packs with multiple surfaces are controlled by several R/W heads (one per surface)
Christalin Nelson | SOCS
16 of 80
7-Apr-24
Disk Storage Devices (9/11)
• Disk Drive Hardware mechanism (2/4)
– Disk controller
• Embedded in the Disk drive
• Controls Disk drive & interfaces it to Computer system
– Received high-level I/O commands  Takes appropriate action to position the arm & causes
R/W action to occur
– SCSI: Standard interface used for disk drives on PCs and workstations
– Given a block address, the Total time needed to locate & transfer the block is in order
of milliseconds (9 to 60 msec)
• Total time = Block Locating Time + Block transfer time
• Where, Block locating Time = Seek time + Rotational delay
– Block locating time > Block transfer time
– Block Transfer time (in msec) > Data processing Time (Done in main memory by current CPUs)
Christalin Nelson | SOCS
17 of 80
7-Apr-24
Disk Storage Devices (10/11)
• Disk Drive Hardware mechanism (3/4)
– Seek time
• Time taken to mechanically position the R/W head on the correct track
• 5 to 10 msec on desktops, 3 to 8 msecs on servers
– Rotational delay or latency
• Time taken to position the beginning of desired block under R/W head
• Depends on the rpm of the disk
– Example
» At 15,000 rpm: Time per rotation = 4 msec, Avg. rotational delay = Time per half revolution,
or 2 msec
» At 10,000 rpm: Avg. rotational delay = 3 msec
– Block transfer time
Christalin Nelson | SOCS
18 of 80
7-Apr-24
Disk Storage Devices (11/11)
• Disk Drive Hardware mechanism (4/4)
– Reducing Block transfer time
• Place “related information” on contiguous blocks. Transfer several blocks on the same track
or cylinder when numerous contiguous blocks are transferred
– Hence, locating first block takes 9 to 60 msec & transferring subsequent blocks takes only 0.4 to 2
msec each
– Disk crash
• Happens if the disk R/W head touches the disk surface because of mechanical malfunction
Christalin Nelson | SOCS
19 of 80
7-Apr-24
Magnetic Tape Storage Devices (1/2)
• Sequential access devices
• Data is stored on reels of high-capacity magnetic tape (similar to audiotapes or
videotapes)
– Bytes are stored consecutively on the tape
• R/W heads are used to R/W data
• Many data records are grouped in one block for better space utilization
– Tape Blocks > Disk blocks
– Inter-block gap of tape > Inter-block gap of Disk
• With a tape density of 1600 to 6250B per inch, the Inter-block gap of 0.6 inches
corresponds to wastage of 960 to 3750B storage space
Christalin Nelson | SOCS
20 of 80
7-Apr-24
Magnetic Tape Storage Devices (2/2)
• Features
– Used to periodically back up and archival of Database
– Store excessively large database files, images, and system libraries
– Slow access, cannot be used to store online data
• Table Libraries
– Automatic labeling software identifies the backup cartridges
– Robotic arms enable parallel write on multiple cartridges using multiple tape drives
– Digital and Super-Digital Linear Tapes (DLTs and SDLTs) have slots for 100s of cartridges
with capacities in 100s of GBs
• Example: SL8500 model of Sun Storage Technology
– Up to 70 petabytes in Up to 448 drives, Max. Throughput rate - 193.2 TB/hr
Christalin Nelson | SOCS
21 of 80
7-Apr-24
Buffering of Blocks (1/2)
• Multiple buffers in main memory can be reserved for buffering multiple blocks
whose block addresses are known
• Buffering is useful when processes can run in a parallel fashion
Christalin Nelson | SOCS
Availability of Separate
disk I/O processors
(OR) Multiple CPUs
With Single
CPU
22 of 80
7-Apr-24
Buffering of Blocks (2/2)
• Double buffering
– Permits continuous R/W of data on consecutive disk blocks
• Eliminates seek time & rotational delay for all but the first block transfer
• Data is kept ready for processing, thus reducing the waiting time in the programs
Christalin Nelson | SOCS
Reading & Processing can
proceed in Parallel
Time (Process a Disk block in
memory) < Time (Read next
block & fill a buffer)
23 of 80
7-Apr-24
Placing File Records on Disk (1/8)
• Records
– Describe entities and their attributes/fields
• Data Types
– Types of values & memory (Bytes) a field/attribute need
– Usually one of the standard data types used in programming
– Handling large unstructured objects (like BLOB)
• Stored separately from its record in a pool of disk blocks & a pointer to the BLOB is included
in the record
• Record Type or Record format definition
– A collection of field names and their corresponding data types (same as prog. langs.)
Christalin Nelson | SOCS
24 of 80
7-Apr-24
Placing File Records on Disk (2/8)
• File
– Sequence of records
– Mostly all records in a file are of the same record type
• File can have fixed-length records (same size in Bytes) or variable-length records
• Why variable-length records?
– Records with same type can have variable-length fields
– Records with same type can have repeating fields (Fields that take multiple values for
individual records). Group of values for the field is called a repeating group.
– Records with same type can have optional fields
– Records with different types (Mixed files of different sizes). i.e. Placing together the
records of different types on disk blocks
Christalin Nelson | SOCS
25 of 80
7-Apr-24
Placing File Records on Disk (3/8)
• Few Record Storage Formats
– Fixed-length record
– Variable-length record with variable length fields
– Variable-length record with optional fields
Christalin Nelson | SOCS
Can have
<field-type, field-value> pairs
26 of 80
7-Apr-24
Placing File Records on Disk (4/8)
• Record Blocking
– Records of a file must be allocated to disk blocks (unit of data transfer between disk
and memory)
– If Block size (B bytes) > Record size (R bytes)
• Each block will contain numerous records
• Note: Some files may have usually large records that cannot fit in one block
– For a file with 'r' records, each record of size 'R‘, and Block size (B)
• Average no. of records per Block = Blocking factor (bfr) = Floor value of (B/R)
• Unused space (in Bytes) = B - (bfr * R)
• No. of Blocks (b) needed for a File = Ceiling value of (r/bfr)
Christalin Nelson | SOCS
27 of 80
7-Apr-24
Placing File Records on Disk (5/8)
• Spanned Records
– Utilization of unused space if Records in a file can span more than one Block
• i.e. Store part of a Record on one Block & the rest on another
• A pointer at the end of 1st Block points to the Block containing the remainder of the Record
(if it is not the next consecutive block on disk)
• Unspanned records
– Used with fixed-length Records (B > R) that cannot cross Block boundaries
– Simplifies record processing because each Record start at a known location in Block
• Note:
– Either spanned or unspanned organization can be used for variable-length Records
• Prefer spanning organization if average Record size is large (reduce wastage of space)
Christalin Nelson | SOCS
28 of 80
7-Apr-24
Placing File Records on Disk (6/8)
• Spanned and Unspanned Record organization
Christalin Nelson | SOCS
29 of 80
7-Apr-24
Placing File Records on Disk (7/8)
• Allocating file blocks of disk
– Contiguous allocation
• File blocks are allocated to consecutive disk blocks
• Reading whole file is very fast using double buffering, expanding the file is difficult
– Linked allocation
• Each file block contains a pointer to the next file block
• Easy to expand the file, Slow to read whole file
– Combination of Contiguous & Linked allocation
• Allocate clusters of consecutive disk blocks, and the clusters are linked
• Clusters are sometimes called file segments (or) extents
– Indexed allocation
• One or more index blocks contain pointers to the actual file blocks
Christalin Nelson | SOCS
30 of 80
7-Apr-24
Placing File Records on Disk (8/8)
• File Header or File descriptor
– Contains file information to determine disk addresses of the file Blocks as well as to
record format descriptions
• Fixed-length unspanned records: field lengths, order of fields within a record
• Variable-length records: field type codes, separator characters, record type codes
– Needed by system programs that access (or) search desired record(s) within main
memory buffers
Christalin Nelson | SOCS
31 of 80
7-Apr-24
File Operations (1/6)
• Actual operations can vary from system to system
• User programs use these operations to access records through program variables
• Classification-1
– Retrieval operations
– Update operations
• Classification-2
– Record-at-a-time operations
• Each operation applies to a single record
• Example: Reset, Find, FindNext, Read, Delete, Modify, Insert, Scan
– Set-at-a-time higher-level operations
• Each operation applies to a set of records
• Example: FindAll, Find n, FindOrdered, Reorganize
Christalin Nelson | SOCS
32 of 80
7-Apr-24
File Operations (2/6)
• Open
– Prepare file for R/W  Allocate required buffers  Retrieve file header  Set file
pointer to BOF
• Reset
– Set file pointer of an open file to BOF
• Close
– Complete file access by releasing buffers  Perform any other needed cleanup
operations
Christalin Nelson | SOCS
33 of 80
7-Apr-24
File Operations (3/6)
• Find (or Locate)
– Search for 1st record that satisfies a search condition  Transfer block into a main
memory buffer  File pointer points to record in buffer (current record)
• Read (or Get)
– Copy current record in buffer to a program variable in user program  Can advance
current record pointer to next record in file (may read next file block from disk)
• FindNext
– Search for next record that satisfies a search condition  Transfer block into a main
memory buffer  File pointer points to record in buffer (current record)
– Types of FindNext available in legacy DBMSs based on hierarchical & network models
• FindNext record within a current parent record, given type, meets a complex condition
Christalin Nelson | SOCS
34 of 80
7-Apr-24
File Operations (4/6)
• Scan
– Streamlines Find, FindNext, and Read into a single operation
– If file was just opened/reset: Return 1st record, else - return next record
– If a condition is specified with the operation: Return record (1st or next) that satisfies
the condition
• Delete
– Delete current record  Update file on disk
• Modify
– Modify some field values in current record  Update file on disk
• Insert
– Locate Block where the Record is to be inserted  Transfer Block into a main memory
buffer  Write Record into buffer  Write from buffer to disk
Christalin Nelson | SOCS
35 of 80
7-Apr-24
File Operations (5/6)
• FindAll
– Locate all Records in file that satisfy a search condition
• Find (or Locate) n
– Search for 1st record that satisfies a search condition  Locate next (n-1) records
satisfying same condition  Transfer blocks containing n records to main memory
buffer
• FindOrdered
– Retrieve all Records in File in a specified order
• Reorganize
– Start reorganization process
– Example: Reorder file records by sorting them on a specified field
Christalin Nelson | SOCS
36 of 80
7-Apr-24
File Operations (6/6)
• File organization vs. Access method
– File organization
• Organization of data of a file into records, blocks, and access structures
• Includes the way records and blocks are placed on the storage medium and interlinked.
• Performs the frequently applied file operations efficiently
– Access method
• Provides a group of operations that can be applied to a file
• Several access methods can be applied to a file organization
• Some access methods can be applied only to files organized in certain ways
– Example: Indexed access method cannot be applied to a file without an index
Christalin Nelson | SOCS
37 of 80
7-Apr-24
Heap Files (1/4)
• Pile File (or) Sequential File
• Simplest and most basic type of organization
• Records are placed in the file in insertion order
– New records are inserted at EOF
• Used with additional access paths (like secondary indexes)
• Used to collect and store data records for future use
Christalin Nelson | SOCS
38 of 80
7-Apr-24
Heap Files (2/4)
• Insertion
– Very efficient
– Address of last file block is kept in File header  Copy last disk block of file into a
buffer  Add new record  Rewrite block back to disk
• Search
– Perform linear search through the file block by block
• Complexity: For a file of ‘b’ blocks
– Average: Search (b/2) blocks if only one record satisfies search condition
– Worst: Read and search all b blocks in the file
• Modify
– Variable-length records require deleting old record  Insert modified record
– The modified record may not fit in its old space on disk
Christalin Nelson | SOCS
39 of 80
7-Apr-24
Heap Files (3/4)
• Delete
– Find Block  Copy Block into a buffer  Delete Record from buffer  Rewrite back to
disk
– Results in wasted storage space
• Solution
– Store an extra byte or bit (deletion marker) with each record  Value of marker identifies if it is
invalid deleted record or valid undeleted record
– Reclaiming unused space of deleted records
» Periodic reorganization of file – Records are packed by removing deleted records
• Use spanned/unspanned organization for unordered files with fixed/variable -length
records
» Use space of deleted records for inserting new records
• Requires extra bookkeeping to keep track of empty locations
Christalin Nelson | SOCS
40 of 80
7-Apr-24
Heap Files (4/4)
• Read
– In-order read: Create sorted copy of file wrt values of some field
• Sorting is expensive for large disk files. Special techniques for external sorting can be used
– Relative or Direct file
• A file of unordered fixed-length records with unspanned blocks and contiguous allocation,
access any record directly by their relative position in the file
– To find ith record in a file having ‘r’ records, and records in each block are numbered 1, 2, ..., bfr
» (i % bfr)th record in ⎡(i/bfr)⎤
th
block
Christalin Nelson | SOCS
41 of 80
7-Apr-24
Sorted Files (1/5)
• Based on ordering field (or) ordering key
– Leads to an ordered or sequential file
– Ordering key: Ordering field is also a key field of the file
• Advantages
– Read is extremely efficient
– Search conditions with >, <, ≥, and ≤ on ordering field is quite efficient
• Search conditions based on ordering key results in faster access with binary search technique
– For a File with ‘b’ blocks, binary search accesses log2
(b) blocks
– Minimal seek time
» Ordered files are put in blocks and stored on contiguous cylinders
» A binary search for disk files can be done on the blocks rather than on the records
• Searching next record from current record in order of ordering key requires no additional block
accesses (current record is not the last one in the block & next record is in the same block)
Christalin Nelson | SOCS
42 of 80
7-Apr-24
Sorted Files (2/5)
• Binary Search Algorithm on Ordering key of a Disk File
Christalin Nelson | SOCS
43 of 80
7-Apr-24
Sorted Files (3/5)
• Note
– Random access based on a non-ordering field uses linear search
– Ordered access based on a non-ordering field requires creation of another sorted copy
• Insert/delete operations are expensive as records must remain physically ordered
– Solution
• (1) Keep some unused space in each block for new records (Temporary)
• (2) Create a temporary unordered file (Overflow or Transaction file). The actual ordered file
is called the Main or Master file.
– Insert new record at end of Overflow file  During file reorganization, periodically sort & merge
Overflow file with Master file
» The records marked for deletion are removed during the reorganization
– Before reorganization, a Linear search is done on Overflow file if Binary search on Main file is not
successful
Christalin Nelson | SOCS
44 of 80
7-Apr-24
Sorted Files (4/5)
• Modification of the field value of a record depends on
– Field to be modified
• Modification of ordering field
– Requires deletion of old record  Insertion of modified record
– Record can change its position in the file
• Modification of non-ordering field
– Change record (fixed-length)  Rewrite it in same physical location on disk
– Search condition to locate the record
Christalin Nelson | SOCS
45 of 80
7-Apr-24
Sorted Files (5/5)
• Indexed-sequential file & Clustered file
– Ordered files based on ordering key used in DB applications with a primary index
• Improves random access time on ordering key field
– If the ordering attribute is not a key, the file is called a clustered file
Christalin Nelson | SOCS
46 of 80
7-Apr-24
Hashing Techniques (1/2)
• Hash file (or) Direct file
– Primary file organization (based on hashing) that provides very fast access to records
• It needs only a single-block access
• Hash field
– Hash key: Hash field is a key field of the file
– Note: An equality condition on hash field is used as Search condition
• Hash function (or) randomizing function (h)
– Yields a disk block address corresponding to hash field value of a record  Store the
record in this block address
• Hashing is also used as an internal search structure within a program whenever a
group of records is accessed exclusively by using the value of one field
Christalin Nelson | SOCS
47 of 80
7-Apr-24
Hashing Techniques (2/2)
• Techniques
– Internal Hashing
• All data items are stored directly within the hash table's buckets or slots
– External Hashing
• Hash table's buckets or slots contain pointers or references to external storage locations
where items are stored
– Hashing Techniques That Allow Dynamic File Expansion
• Extendible Hashing, Linear Hashing, Dynamic Hashing
Christalin Nelson | SOCS
48 of 80
7-Apr-24
Internal Hashing (1/12)
• For internal files, hashing is typically implemented as a hash table
– Consider an array of records of size M
– A suitable hash function transforms hash field value into array index 0 to M-1 that
corresponds to M slots
• h(Key) = Key % M, which returns record address (array index)
– Key can be Integer or String (needs transformation)
Christalin Nelson | SOCS
49 of 80
7-Apr-24
Internal Hashing (2/12)
• Example-1: Find hash code of the key “AKEY” if size of table is 5, and the Hash
function is h(key) = key % m.
– If each character can be represented by 5 digits, the key is treated as a base-32 (i.e. 25)
number.
– Consider the decimal representation of each alphabet: A = 1, B = 2, ……., Z = 26
– Key transformation: key = (1x323) + (11x322) + (5x321) + (25x320) = 44271
– Find index: h(44271) = 44271 % 5 = 1
– Hence the given key “AKEY” corresponds to the index [1] of the hash table
Christalin Nelson | SOCS
50 of 80
7-Apr-24
Internal Hashing (3/12)
• Few more hash functions
– (1) Non-integer hash field values can be transformed into integers before the mod
function is applied. For character strings, the numeric (ASCII) codes associated with
characters can be used in the transformation (i.e. Multiply the code values)
– (2) Folding: Apply an arithmetic function (like addition) or a logical function (like XOR)
to different portions of the hash field value to calculate the hash address
• Example: If size of hash table (M) is 1000 and Hash key is a 6-digit integer. The key 235469
can be stored at the address: (235+964) mod 1000 = 199.
– (3) Pick some digits of hash field value (3rd, 5th, and 8th digits) to form the hash address
• Example: If size of hash table (M) is 1000 and Hash key is a 10-digit SSN. The SSN
3016789234 gives a hash value of 172.
Christalin Nelson | SOCS
51 of 80
7-Apr-24
Internal Hashing (4/12)
• Properties of a Good Hash Function
– Makes use of all information provided by the key
– Uniformly distribute records across the hash table [reduced length of linked lists]. M is
chosen as prime no.
– Maps similar keys to very different hash values
– Uses very fast operations
• Problem with most hashing functions
– It does not guarantee that distinct values will hash to distinct addresses because hash
field space (No. of possible values a hash field can take) > Address space (No. of
available addresses for records)
Christalin Nelson | SOCS
52 of 80
7-Apr-24
Internal Hashing (5/12)
• Algorithm for h(k) = k mod M
• Load Factor (λ) = N/M, where N – No. of record to be stored, M – Hash table size
• Collision
– Occurs when hash field value of a record that is being inserted hashes to an address
that already contains a different record
• Collision Resolution
– Process of finding another position that does not lead to collision (Hash address is not
occupied)
Christalin Nelson | SOCS
53 of 80
7-Apr-24
Internal Hashing (6/12)
• Collision Resolution Techniques
– (1) Open addressing (e.g. Linear probing, Quadratic Probing)
– (2) Chaining
• Various overflow locations are kept, usually by extending the array with several overflow
positions. Additionally, a pointer field is added to each record location
• Collision is resolved by placing new record in an unused overflow location  Set pointer of
occupied hash address location = Address of that overflow location
• A linked list of overflow records for each hash address is thus maintained
– (3) Multiple hashing
• Example: The program applies second hash function if the first results in a collision. If
another collision results, the program uses open addressing or applies a third hash
function. Then, uses open addressing if necessary.
Christalin Nelson | SOCS
54 of 80
7-Apr-24
Internal Hashing (7/12)
• Example (Linear Probing)
– Construct a Hash table for given keys [A, S, E, A, R, C, H, I, N, G, E, X, A, M, P, L, E] using
Linear Probing. Assume M=19.
– Algorithm for simple hash function, h(k) = k mod M
Christalin Nelson | SOCS
Key A S E A R C H I N G E X A M P L E
Hash Value 1 0 5 1 18 3 8 9 14 7 5 5 1 13 16 12 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
S A A C A E E G H I X E L M N P R
55 of 80
7-Apr-24
Internal Hashing (8/12)
• Open Addressing
– Disadvantage of Open Addressing
• Primary clustering
• Linear probing runs slowly for nearly full tables
– Note: For Open Addressing
• Search/Insertion is faster for λ<1
• Good strategy: λ < 0.5 (i.e. M > 2N)
• If λ > 0.5, then re-hashing is preferable (with a bigger table)
– Algorithm
Christalin Nelson | SOCS
56 of 80
7-Apr-24
Internal Hashing (9/12)
• Chaining
– Chaining technique is preferred when N & M cannot be predicted in advance. The
choice of M basically depends on other factors such as available memory.
• Typically M is chosen to be relatively small so as not to use up a large area of contiguous
memory, but large enough to have short lists for more efficient sequential search
• In general, M can vary from 0.1N to N. i.e. λ = 1 to 10.
– This method is useful for highly dynamic situations
Christalin Nelson | SOCS
57 of 80
7-Apr-24
Internal Hashing (10/12)
• Example: Chaining
Christalin Nelson | SOCS
58 of 80
7-Apr-24
Internal Hashing (11/12)
• Example: Construct a Hash table for given keys [A, S, E, A, R, C, H, I, N, G, E, X, A,
M, P, L, E] using a separate chaining technique. Assume M=11. [Note: Each
character is a key]
Christalin Nelson | SOCS
Key A S E A R C H I N G E X A M P L E
Hash Value 1 8 5 1 7 3 8 9 3 7 5 2 1 2 5 1 5
0
1 L A A A
2 M X
3 N C
4
5 E P E E
6
7 G R
8 H S
9 I
10
59 of 80
7-Apr-24
Internal Hashing (12/12)
• Each collision resolution method requires its own algorithms for insertion,
retrieval, and deletion of records
– The algorithms for chaining are the simplest
– Deletion algorithms for open addressing are complex
• Simulation and analysis
– Load factor is 0.7 to 0.9 to reduce number of collisions and avoid wastage of space
– Other hash functions may require M to be a power of 2
Christalin Nelson | SOCS
60 of 80
7-Apr-24
External Hashing (1/6)
• Hashing for disk files
– The target address space is made of buckets (each bucket can hold multiple records)
• A bucket is either one disk block or a cluster of contiguous disk blocks
– Hashing function
• Maps a key into a relative bucket number (not an absolute block address)
• A table maintained in the File header converts Bucket number  Corresponding Disk block
address
Christalin Nelson | SOCS
61 of 80
7-Apr-24
External Hashing (2/6)
• Order-preserving hash function
– Maintain records in order of hash field values
• Example: The leftmost 3 digits of the invoice number field can be used to get bucket
address as a hash address & records are also sorted by invoice number within each bucket
• Collision
– Less severe as many records can hash to the same bucket without causing problems.
– When a bucket is filled & a new record being inserted hashes to that bucket  Use a
variation of chaining
• i.e. Maintain a pointer in each bucket to a linked list of overflow records for the bucket
• Pointers in linked list are record pointers (include both block address & a relative record
position within the block)
Christalin Nelson | SOCS
62 of 80
7-Apr-24
Christalin Nelson | SOCS
Handling overflow for
buckets by chaining
63 of 80
7-Apr-24
External Hashing (4/6)
• Static hashing
– Number of buckets (M) is fixed
– Not suited for dynamic files
• If M – Fixed number of buckets, and ‘n’ – Max. number of records per bucket, No. of
records that can fitted = (n *M)
– If no. of records < (n * M), we are left with a lot of unused space
– If no. of records > (n * M), numerous collisions will occur & retrieval will be slowed down
because of the long lists of overflow records
– Hence, M has to be changed & a new hash function (based on new value of M) should be used to
redistribute the records. These reorganizations can be quite time-consuming for large files.
• Dynamic file organizations based on hashing allow M to vary dynamically with
only localized reorganization
Christalin Nelson | SOCS
64 of 80
7-Apr-24
External Hashing (5/6)
• Searching for a record given a value of some field other than the hash field is
expensive
• Record deletion
– Remove the record from its bucket
• If bucket has an overflow chain: Move one of the overflow records into the bucket to
replace the deleted record
• If the record to be deleted is already in overflow: Remove it from Linked list  Keep track
of empty positions in overflow by maintaining a linked list of unused overflow locations
Christalin Nelson | SOCS
65 of 80
7-Apr-24
External Hashing (6/6)
• Modifying a specific record’s field value depends on
– (1) The search condition to locate that specific record
• If the search condition is an equality comparison on the hash field, record can be located
efficiently by using the hashing function; otherwise, a linear search is done.
– (2) The field to be modified
• Modifying non-hash field: Change the record  Rewrite it in the same bucket
• Modifying hash field: Move the record to another bucket (which requires deletion) 
Insert the modified record
Christalin Nelson | SOCS
66 of 80
7-Apr-24
Hashing Techniques That Allow Dynamic File Expansion
• Used to handle hash table overflow & dynamic resizing of hash table as number of stored
items increases or decreases
• Extendible hashing
– Stores an access structure in addition to the file (somewhat similar to indexing)
– Vs. Indexing
• Access structure is based on the Hash function result on a search field
• In indexing, the access structure is based on the values of the search field itself
• Dynamic hashing
– Uses an access structure based on binary tree data structures.
– Hash function result (hash value of a record) is represented as a binary number (string of bits)
– The access structure is built on the binary representation of the hash function result
– Records are distributed among buckets based on the values of leading bits in their hash values
• Linear hashing
– Does not require additional access structures
Christalin Nelson | SOCS
67 of 80
7-Apr-24
Extendible hashing (1/5)
• Maintain a directory
– i.e. array of 2d bucket addresses (d – global depth of directory)
– Implemented using dynamic array or trie
• Operation
– Hash function maps keys to hash values (represented as binary strings)  Index to
directory (integer value) corresponding to high-order(first) d bits of hash value  Get
Bucket Address  Store the corresponding record in bucket
• Several directory entries (with same first d’ bits for their hash values) may contain same
bucket address if all the records that hash to these locations fit in a single bucket
• Local depth (d’) specifies no. of bits on which the bucket contents are based
• Most record retrievals require two block accesses
– (1) To the directory, and (2) To the bucket
Christalin Nelson | SOCS
68 of 80
7-Apr-24
Christalin Nelson | SOCS
Structure of Extendible hashing scheme
using a directory with global depth d = 3
69 of 80
7-Apr-24
Extendible hashing (3/5)
• Doubling
– Increase d by one at a time, if a bucket is full (overflow could happen) and d’ = d
– Doubles the number of buckets available in the hash table. i.e. Allows additional space
to store more items in directory
• Each bucket in the existing directory is split into two new buckets  Update the directory
to point to new buckets
– Reduces the likelihood of collisions and maintains efficient access times
• Halving
– Decrease d by one at a time, if the hash table becomes sparsely populated and d > d’
– Halves the number of buckets
• Pairs of buckets are merged into single buckets  Update directory to point to respective
buckets
– Ensures that memory is not wasted due to empty or sparsely populated buckets, thus
optimizing memory usage and maintaining efficient access times
Christalin Nelson | SOCS
70 of 80
7-Apr-24
Extendible hashing (4/5)
• Bucket splitting
– When a collision happens, the affected bucket is split into two new buckets, to
increase the capacity of the hash table & accommodate additional records
• i.e. If the affected bucket’s hash value starts with 01  Split bucket into two buckets with
hash values starting with 010 and 011  Update directory to point to respective buckets
– Local depth (d’) of the two new buckets becomes one more than the old bucket
• Continued Bucket splitting can indeed lead to Doubling
– While bucket splitting initially increases the number of buckets (and d’) in the hash
table without changing the directory size (d)
– Continued bucket splitting operations (even after d’ = d) will eventually necessitate
directory expansion (doubling) to accommodate the growing number of buckets
Christalin Nelson | SOCS
71 of 80
7-Apr-24
Extendible hashing (5/5)
• Advantage
– Performance of the file does not degrade as the file grows
• Vs. Static external hashing: As collisions increase, chaining increases avg. accesses per key
– No space is allocated for future growth. Additional buckets can be allocated
dynamically as needed. The space overhead for the directory table is negligible.
• The maximum directory size is 2k, where k – No. of bits in the hash value
– Splitting causes minor reorganization in most cases
– Reorganization due to Doubling/Halving is more expensive
• Disadvantage
– Requires 2 block accesses (vs. one in static hashing)
Christalin Nelson | SOCS
72 of 80
7-Apr-24
Dynamic Hashing (1/3)
• Serves as a foundational concept in the development of extendible hashing and
other dynamic hash table techniques
• Features
– Overflow Handling
• Handled hash collisions and overflow by employing techniques such as overflow buckets or
linked lists within buckets to accommodate additional keys
– Bucket Splitting and Merging
• Buckets could be split or merged to redistribute keys and maintain a balanced load factor
• Typically done at the bucket level and did not involve resizing the directory
– Directory Structure
• Typically static and did not support dynamic resizing
• Consists of a fixed-size array of pointers or references to buckets
Christalin Nelson | SOCS
73 of 80
7-Apr-24
Dynamic Hashing (2/3)
• Bucket Addresses
– Either n high-order bits (or) n-1 high-order bits
– Depends on total number of keys belonging to the respective bucket
• The major difference is in the organization of the directory
– (Extendible hashing) Uses global depth (high-order d bits) for the flat directory &
combines adjacent collapsible buckets into a bucket of local depth d’
– (Dynamic hashing) maintains a tree-structured directory with 2 node types
• Internal Directory nodes that have two pointers
– Left pointer corresponds to bit 0 (in the hashed address)
– Right pointer corresponding to bit 1
• Leaf Directory nodes
– Hold a pointer to the actual bucket with records
Christalin Nelson | SOCS
74 of 80
7-Apr-24
Christalin Nelson | SOCS
Structure of the dynamic
hashing scheme
75 of 80
7-Apr-24
Linear Hashing (1/3)
• Hash file can dynamically expand/shrink its no. of buckets without a directory
• Operation
– File starts with M buckets numbered 0, 1, ..., M-1 & uses initial hash function hi(K) = K
mod M
– A value 'n' is used to find the buckets that have split.
• Initially set n=0. Increment n linearly by 1 whenever a delayed split occurs
– Maintain individual overflow chains for each bucket
• When collision leads to an overflow record in any file bucket  Split Bucket-0 into two
buckets: Original bucket (0) & New bucket (M) at EOF  Distribute records between two
buckets based on a different hashing function hi+1(K) = K mod 2M  Linear Increment n
– Note: Any record that hashed to bucket-0 based on hi will hash to either bucket-0 or bucket-M
based on hi+1
– Retrieve record with hash key value K: First apply hi to K  If hi(K) < n, apply hi+1 on K because
the bucket is already split
Christalin Nelson | SOCS
76 of 80
7-Apr-24
Linear Hashing (2/3)
• Operation (contd.)
– If enough overflows occur due to collision, all Original file buckets 0, 1, ..., M-1 will be
split in linear order and n=M. The file now has 2M buckets that use hi+1
– At this point, reset n=0. Any new collisions that cause overflow lead to use of the new
hashing function hi+2(K) = K mod 4M.
• File load factor (l) = r/(bfr * N)
– Where
• r – Current number of file records
• bfr – Max. no. of records that can fit in a bucket
• N – Current no. of file buckets
Christalin Nelson | SOCS
77 of 80
7-Apr-24
Linear Hashing (3/3)
• Splitting & Merging can be controlled by monitoring load factor (l)
– Splits can be triggered when ‘l’ exceeds a certain threshold (say, 0.9)
– Combinations can be triggered when ‘l’ falls below another threshold (say, 0.7)
• Blocks are combined linearly, and N is decremented appropriately
• Linear hashing maintains load factor fairly constantly while file grows/shrinks
Christalin Nelson | SOCS
78 of 80
7-Apr-24
Parallelizing disk access with RAID
• Refer Slides 88-133 from the Presentation titled “Storage System Architecture”
available in Slideshare
Christalin Nelson | SOCS
79 of 80
Thank You
Christalin Nelson | SOCS
7-Apr-24
7-Apr-24
80 of 80

More Related Content

What's hot

Data Modeling - Entity Relationship Diagrams-1.pdf
Data Modeling - Entity Relationship Diagrams-1.pdfData Modeling - Entity Relationship Diagrams-1.pdf
Data Modeling - Entity Relationship Diagrams-1.pdfChristalin Nelson
 
Datasets and catalogs
Datasets and catalogs Datasets and catalogs
Datasets and catalogs Roma Vyas
 
Raid (Redundant Array of Inexpensive Disks) in Computer Architecture
Raid (Redundant Array of Inexpensive Disks) in Computer ArchitectureRaid (Redundant Array of Inexpensive Disks) in Computer Architecture
Raid (Redundant Array of Inexpensive Disks) in Computer ArchitectureAiman Hafeez
 
Introduction To Database Management System
Introduction To Database Management SystemIntroduction To Database Management System
Introduction To Database Management Systemcpjcollege
 
Concurrency Management
Concurrency ManagementConcurrency Management
Concurrency ManagementSURBHI SAROHA
 
File organization (part 1)
File organization (part 1)File organization (part 1)
File organization (part 1)SURBHI SAROHA
 
Db2 and storage management (mullins)
Db2 and storage management (mullins)Db2 and storage management (mullins)
Db2 and storage management (mullins)Craig Mullins
 
Distributed databases and dbm ss
Distributed databases and dbm ssDistributed databases and dbm ss
Distributed databases and dbm ssMohd Arif
 
My Experience Using Oracle SQL Plan Baselines 11g/12c
My Experience Using Oracle SQL Plan Baselines 11g/12cMy Experience Using Oracle SQL Plan Baselines 11g/12c
My Experience Using Oracle SQL Plan Baselines 11g/12cNelson Calero
 

What's hot (20)

Data Modeling - Entity Relationship Diagrams-1.pdf
Data Modeling - Entity Relationship Diagrams-1.pdfData Modeling - Entity Relationship Diagrams-1.pdf
Data Modeling - Entity Relationship Diagrams-1.pdf
 
Sql commands
Sql commandsSql commands
Sql commands
 
Database overview
Database overviewDatabase overview
Database overview
 
Storage system architecture
Storage system architectureStorage system architecture
Storage system architecture
 
DB2 utilities
DB2 utilitiesDB2 utilities
DB2 utilities
 
Process Management
Process ManagementProcess Management
Process Management
 
Process Synchronization
Process SynchronizationProcess Synchronization
Process Synchronization
 
Deadlocks
DeadlocksDeadlocks
Deadlocks
 
DB2 on Mainframe
DB2 on MainframeDB2 on Mainframe
DB2 on Mainframe
 
SKILLWISE-DB2 DBA
SKILLWISE-DB2 DBASKILLWISE-DB2 DBA
SKILLWISE-DB2 DBA
 
Datasets and catalogs
Datasets and catalogs Datasets and catalogs
Datasets and catalogs
 
Unit 01 dbms
Unit 01 dbmsUnit 01 dbms
Unit 01 dbms
 
Raid (Redundant Array of Inexpensive Disks) in Computer Architecture
Raid (Redundant Array of Inexpensive Disks) in Computer ArchitectureRaid (Redundant Array of Inexpensive Disks) in Computer Architecture
Raid (Redundant Array of Inexpensive Disks) in Computer Architecture
 
Introduction To Database Management System
Introduction To Database Management SystemIntroduction To Database Management System
Introduction To Database Management System
 
Concurrency Management
Concurrency ManagementConcurrency Management
Concurrency Management
 
File organization (part 1)
File organization (part 1)File organization (part 1)
File organization (part 1)
 
Db2 and storage management (mullins)
Db2 and storage management (mullins)Db2 and storage management (mullins)
Db2 and storage management (mullins)
 
Distributed databases and dbm ss
Distributed databases and dbm ssDistributed databases and dbm ss
Distributed databases and dbm ss
 
Storage
StorageStorage
Storage
 
My Experience Using Oracle SQL Plan Baselines 11g/12c
My Experience Using Oracle SQL Plan Baselines 11g/12cMy Experience Using Oracle SQL Plan Baselines 11g/12c
My Experience Using Oracle SQL Plan Baselines 11g/12c
 

Similar to DiskStorage_BasicFileStructuresandHashing.pdf

DownloadClassSessionFile (44).pdf
DownloadClassSessionFile (44).pdfDownloadClassSessionFile (44).pdf
DownloadClassSessionFile (44).pdfHanaBurhan1
 
Secondary storage devices
Secondary storage devices Secondary storage devices
Secondary storage devices Slideshare
 
19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdf19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdfJESUNPK
 
disk sechduling
disk sechdulingdisk sechduling
disk sechdulinggopi7
 
Secondarystoragedevices1 130119040144-phpapp02
Secondarystoragedevices1 130119040144-phpapp02Secondarystoragedevices1 130119040144-phpapp02
Secondarystoragedevices1 130119040144-phpapp02Seshu Chakravarthy
 
Disks and Stable Storage.ppt
Disks and Stable Storage.pptDisks and Stable Storage.ppt
Disks and Stable Storage.pptjguuhxxxfp
 
Mass storage systems presentation operating systems
Mass storage systems presentation operating systemsMass storage systems presentation operating systems
Mass storage systems presentation operating systemsnight1ng4ale
 
Mass Storage Structure
Mass Storage StructureMass Storage Structure
Mass Storage StructureVimalanathan D
 
11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMSkoolkampus
 
db
dbdb
dbAisu
 

Similar to DiskStorage_BasicFileStructuresandHashing.pdf (20)

Os7
Os7Os7
Os7
 
DownloadClassSessionFile (44).pdf
DownloadClassSessionFile (44).pdfDownloadClassSessionFile (44).pdf
DownloadClassSessionFile (44).pdf
 
Unit 4 DBMS.ppt
Unit 4 DBMS.pptUnit 4 DBMS.ppt
Unit 4 DBMS.ppt
 
Secondary storage devices
Secondary storage devices Secondary storage devices
Secondary storage devices
 
19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdf19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdf
 
disk sechduling
disk sechdulingdisk sechduling
disk sechduling
 
Ch9 mass storage systems
Ch9   mass storage systemsCh9   mass storage systems
Ch9 mass storage systems
 
Secondarystoragedevices1 130119040144-phpapp02
Secondarystoragedevices1 130119040144-phpapp02Secondarystoragedevices1 130119040144-phpapp02
Secondarystoragedevices1 130119040144-phpapp02
 
Disks and Stable Storage.ppt
Disks and Stable Storage.pptDisks and Stable Storage.ppt
Disks and Stable Storage.ppt
 
Secondary storage devices
Secondary storage devicesSecondary storage devices
Secondary storage devices
 
Mass storage systems presentation operating systems
Mass storage systems presentation operating systemsMass storage systems presentation operating systems
Mass storage systems presentation operating systems
 
Class notesfeb27
Class notesfeb27Class notesfeb27
Class notesfeb27
 
OS Unit5.pptx
OS Unit5.pptxOS Unit5.pptx
OS Unit5.pptx
 
Ch11
Ch11Ch11
Ch11
 
Mass Storage Structure
Mass Storage StructureMass Storage Structure
Mass Storage Structure
 
Module5 secondary storage
Module5 secondary storageModule5 secondary storage
Module5 secondary storage
 
11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS
 
Db
DbDb
Db
 
db
dbdb
db
 
Thiru
ThiruThiru
Thiru
 

More from Christalin Nelson

More from Christalin Nelson (15)

Packages and Subpackages in Java
Packages and Subpackages in JavaPackages and Subpackages in Java
Packages and Subpackages in Java
 
Bitwise complement operator
Bitwise complement operatorBitwise complement operator
Bitwise complement operator
 
Advanced Data Structures - Vol.2
Advanced Data Structures - Vol.2Advanced Data Structures - Vol.2
Advanced Data Structures - Vol.2
 
CPU Scheduling
CPU SchedulingCPU Scheduling
CPU Scheduling
 
Applications of Stack
Applications of StackApplications of Stack
Applications of Stack
 
Data Storage and Information Management
Data Storage and Information ManagementData Storage and Information Management
Data Storage and Information Management
 
Application Middleware Overview
Application Middleware OverviewApplication Middleware Overview
Application Middleware Overview
 
Network security
Network securityNetwork security
Network security
 
Directory services
Directory servicesDirectory services
Directory services
 
System overview
System overviewSystem overview
System overview
 
Storage overview
Storage overviewStorage overview
Storage overview
 
Computer Fundamentals-2
Computer Fundamentals-2Computer Fundamentals-2
Computer Fundamentals-2
 
Computer Fundamentals - 1
Computer Fundamentals - 1Computer Fundamentals - 1
Computer Fundamentals - 1
 
Advanced data structures vol. 1
Advanced data structures   vol. 1Advanced data structures   vol. 1
Advanced data structures vol. 1
 
Programming in c++
Programming in c++Programming in c++
Programming in c++
 

Recently uploaded

會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽中 央社
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...EduSkills OECD
 
ANTI PARKISON DRUGS.pptx
ANTI         PARKISON          DRUGS.pptxANTI         PARKISON          DRUGS.pptx
ANTI PARKISON DRUGS.pptxPoojaSen20
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFVivekanand Anglo Vedic Academy
 
Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesAmanpreetKaur157993
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...Gary Wood
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...Nguyen Thanh Tu Collection
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17Celine George
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital ManagementMBA Assignment Experts
 
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Denish Jangid
 
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...Krashi Coaching
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjMohammed Sikander
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxheathfieldcps1
 
How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17Celine George
 
PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxMarlene Maheu
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....Ritu480198
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...Nguyen Thanh Tu Collection
 

Recently uploaded (20)

會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
ANTI PARKISON DRUGS.pptx
ANTI         PARKISON          DRUGS.pptxANTI         PARKISON          DRUGS.pptx
ANTI PARKISON DRUGS.pptx
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDF
 
Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategies
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management
 
“O BEIJO” EM ARTE .
“O BEIJO” EM ARTE                       .“O BEIJO” EM ARTE                       .
“O BEIJO” EM ARTE .
 
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
 
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
 
How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17
 
PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptx
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 

DiskStorage_BasicFileStructuresandHashing.pdf

  • 1. Disk Storage, Basic File Structures, and Hashing Christalin Nelson | SOCS 1 of 80
  • 2. At a Glance • Categories of Computer Storage Media • Memory Hierarchy • Storage of Databases • Disk Storage Devices • Tape Storage Devices • Buffering of Blocks • Placing File Records on Disk • File Operations • Heap Files • Sorted Files • Hashing Techniques • Parallelizing Disk Access using RAID Technology 7-Apr-24 Christalin Nelson | SOCS 2 of 80
  • 3. 7-Apr-24 Categories of Computer Storage Media • Primary storage – Storage media that can be operated directly by the CPU – Volatile, fast access, less storage capacity, and expensive – Example: Main memory, Cache memory • Secondary & Tertiary storage – Data cannot be processed directly by the CPU (copied into primary storage for processing) – Nonvolatile, Removable media, slower access, larger capacity, and less cost – Example: • Secondary: Hard disks • Tertiary: Optical disks (CD-ROMs, DVDs), tapes Christalin Nelson | SOCS 3 of 80
  • 4. 7-Apr-24 Memory Hierarchy (1/2) • Primary Storage – Cache Memory • Static RAM (SRAM) • Used by CPU to speed up execution of program instructions using techniques such as prefetching and pipelining • Most expensive – Main memory • Dynamic RAM (DRAM) • Provides the main work area for the CPU for keeping program instructions and data • Low cost, volatile, lower speed (vs. static RAM) Christalin Nelson | SOCS 4 of 80
  • 5. 7-Apr-24 Memory Hierarchy (2/2) • Secondary & Tertiary Storage – Flash Memory (EEPROM) • Used with cameras, MP3 players, cell phones, PDAs – Magnetics Disks – Mass Storage • WORM (archival, more life than magnetic Disks) • CDROM, DVD, Optical jukebox memories (loaded on demand) – Magnetic Tapes (archival), Tape Jukebox (loaded on demand) • Note – Main Memory Databased for RT applications Christalin Nelson | SOCS 5 of 80
  • 6. 7-Apr-24 Storage of Databases (1/3) • Persistent data (vs. Transient data) • Stored on Magnetic Disks – DB is too large to fit entirely in Main memory – Main memory is a volatile storage – The cost of storage per unit of data is greater for primary storage • Backed up on Tapes – Cost is less than storage on disk & slow data access – Offline vs. Online storage • Tapes are offline devices – should be loaded for data to become available • Disks are online devices – can be accessed directly at any time Christalin Nelson | SOCS 6 of 80
  • 7. 7-Apr-24 Storage of Databases (2/3) • DBMS has several options available for data organization – Physical DB design involves the selection of a particular data organization technique that best suits the given application requirements – DB designers & DBA must know the pros & cons of each storage technique when they design, implement, and operate a DB on a specific DBMS Christalin Nelson | SOCS 7 of 80
  • 8. 7-Apr-24 Storage of Databases (3/3) • File Organization – Primary file organization • Determine how the file records are physically stored and how they can be accessed • Examples – Heap file (or unordered file) - Appends new records at EOF – Sorted file (or sequential file) - Records are ordered by the value of a particular field (called the sort key) – Hashed file - Uses a hash function applied to a particular field (called the hash key) to determine a record’s placement on disk – Other primary file organizations: B-trees – Secondary organization (or) auxiliary access structure • Allows efficient access to file records based on alternate fields (exist as indexes) than those that have been used for the primary file organization Christalin Nelson | SOCS 8 of 80
  • 9. 7-Apr-24 Disk Storage Devices (1/11) • Preferred secondary storage device for high storage capacity and low-cost • Disk is made of magnetic material shaped as a thin circular disk & protected by a plastic or acrylic cover – Data stored as magnetized areas on magnetic disk surfaces • Basic unit: Bit (0 or 1) – Disk capacity (No. of bytes stored on the Disk) • It can be single or double-sided • Disk packs – Contains several magnetic disks connected to a rotating spindle – Used with Servers Christalin Nelson | SOCS 9 of 80
  • 10. 7-Apr-24 Disk Storage Devices (2/11) • Track – Concentric circles on a disk surface of small width, each having a distinct diameter • No. of Tracks: few 100s to 1000s • Capacity: 10s of kB to 150 kB – Cylinder • Tracks with the same diameter on various surfaces • Data stored in one cylinder can be retrieved much faster than if it were distributed among different cylinders – Tracks are divided into smaller sectors • Note: Not all disks have their tracks divided into sectors Christalin Nelson | SOCS 10 of 80
  • 11. 7-Apr-24 Disk Storage Devices (3/11) • Sectors – Physical units of storage on the disk – Sectors are hard-coded on the disk surface • Fixed-size portion that can be R/W by the disk's R/W head • Sector size: 512 B to 4 KB (Vary depending on the disk's formatting and specifications) • ZBR (Zone Bit Recording) – Allows a range of cylinders to have the same number of sectors per arc – Example: Cylinders 0–99 may have 1 sector per track, 100–199 may have 2 sectors per track, etc. – A disk with hard-coded sectors often has the sectors subdivided into blocks during initialization • Transfer of data between main memory and disk takes place in units of disk blocks Christalin Nelson | SOCS 11 of 80
  • 12. 7-Apr-24 Disk Storage Devices (4/11) • Blocks – Logical units of storage managed by the file system – The division of a track into equal-sized disk blocks (or pages) is set by the OS during disk formatting (or initialization) and cannot be changed dynamically • Blocks are typically larger than sectors and are used for organizing and managing files • Disk block size: 512 to 8192 B – Blocks are separated by fixed-size inter-block gaps • Include specially coded control information written during disk initialization – This information is used to determine which block on the track follows each inter-block gap Christalin Nelson | SOCS 12 of 80
  • 13. (a) Sectors subtending a fixed angle (b) Sectors maintaining a uniform recording density (a) A single-sided disk with read/write hardware (b) A disk pack with read/write hardware
  • 15. 7-Apr-24 Disk Storage Devices (7/11) • Hardware address of a block – Cylinder number + Track number + Block number • Track Number: Surface no. within the cylinder on which the track is located • Block number (within the track): Supplied to disk I/O hardware – In Modern Disk drives • Logical Block Address (LBA) – A single number between 0 and n (If total disk capacity is n + 1 blocks) – Disk drive controller automatically maps LBA to the right block • Buffer address – Contiguous reserved area in main storage (holds 1 disk block) » Read command (Copy Disk block into Buffer), Write command (Copy buffer into Disk block) – Cluster (several contiguous blocks) can also be transferred as a unit » Buffer size = Number of bytes in Cluster Christalin Nelson | SOCS A disk is a Random access addressable device 15 of 80
  • 16. 7-Apr-24 Disk Storage Devices (8/11) • Disk Drive Hardware mechanism (1/4) – Disk or disk pack is mounted on disk drive – Electrical Motor-1: Rotates disks continuously at a constant speed (5400 to 15000 rpm) – Read/write head • Movable-Head disks – Mechanical Arms connect R/W heads to an actuator attached to Electrical motor-2 » Motor moves R/W heads in unison & positions precisely over a cylinder & block specified in a block address  Activate electronic component attached to arm to transfer data – Used in common • Fixed-Head disks (No. of R/W Heads = No. of tracks) – A track or cylinder is selected by electronically switching to the appropriate R/W head – Faster, high cost (due to additional R/W heads) • Disk packs with multiple surfaces are controlled by several R/W heads (one per surface) Christalin Nelson | SOCS 16 of 80
  • 17. 7-Apr-24 Disk Storage Devices (9/11) • Disk Drive Hardware mechanism (2/4) – Disk controller • Embedded in the Disk drive • Controls Disk drive & interfaces it to Computer system – Received high-level I/O commands  Takes appropriate action to position the arm & causes R/W action to occur – SCSI: Standard interface used for disk drives on PCs and workstations – Given a block address, the Total time needed to locate & transfer the block is in order of milliseconds (9 to 60 msec) • Total time = Block Locating Time + Block transfer time • Where, Block locating Time = Seek time + Rotational delay – Block locating time > Block transfer time – Block Transfer time (in msec) > Data processing Time (Done in main memory by current CPUs) Christalin Nelson | SOCS 17 of 80
  • 18. 7-Apr-24 Disk Storage Devices (10/11) • Disk Drive Hardware mechanism (3/4) – Seek time • Time taken to mechanically position the R/W head on the correct track • 5 to 10 msec on desktops, 3 to 8 msecs on servers – Rotational delay or latency • Time taken to position the beginning of desired block under R/W head • Depends on the rpm of the disk – Example » At 15,000 rpm: Time per rotation = 4 msec, Avg. rotational delay = Time per half revolution, or 2 msec » At 10,000 rpm: Avg. rotational delay = 3 msec – Block transfer time Christalin Nelson | SOCS 18 of 80
  • 19. 7-Apr-24 Disk Storage Devices (11/11) • Disk Drive Hardware mechanism (4/4) – Reducing Block transfer time • Place “related information” on contiguous blocks. Transfer several blocks on the same track or cylinder when numerous contiguous blocks are transferred – Hence, locating first block takes 9 to 60 msec & transferring subsequent blocks takes only 0.4 to 2 msec each – Disk crash • Happens if the disk R/W head touches the disk surface because of mechanical malfunction Christalin Nelson | SOCS 19 of 80
  • 20. 7-Apr-24 Magnetic Tape Storage Devices (1/2) • Sequential access devices • Data is stored on reels of high-capacity magnetic tape (similar to audiotapes or videotapes) – Bytes are stored consecutively on the tape • R/W heads are used to R/W data • Many data records are grouped in one block for better space utilization – Tape Blocks > Disk blocks – Inter-block gap of tape > Inter-block gap of Disk • With a tape density of 1600 to 6250B per inch, the Inter-block gap of 0.6 inches corresponds to wastage of 960 to 3750B storage space Christalin Nelson | SOCS 20 of 80
  • 21. 7-Apr-24 Magnetic Tape Storage Devices (2/2) • Features – Used to periodically back up and archival of Database – Store excessively large database files, images, and system libraries – Slow access, cannot be used to store online data • Table Libraries – Automatic labeling software identifies the backup cartridges – Robotic arms enable parallel write on multiple cartridges using multiple tape drives – Digital and Super-Digital Linear Tapes (DLTs and SDLTs) have slots for 100s of cartridges with capacities in 100s of GBs • Example: SL8500 model of Sun Storage Technology – Up to 70 petabytes in Up to 448 drives, Max. Throughput rate - 193.2 TB/hr Christalin Nelson | SOCS 21 of 80
  • 22. 7-Apr-24 Buffering of Blocks (1/2) • Multiple buffers in main memory can be reserved for buffering multiple blocks whose block addresses are known • Buffering is useful when processes can run in a parallel fashion Christalin Nelson | SOCS Availability of Separate disk I/O processors (OR) Multiple CPUs With Single CPU 22 of 80
  • 23. 7-Apr-24 Buffering of Blocks (2/2) • Double buffering – Permits continuous R/W of data on consecutive disk blocks • Eliminates seek time & rotational delay for all but the first block transfer • Data is kept ready for processing, thus reducing the waiting time in the programs Christalin Nelson | SOCS Reading & Processing can proceed in Parallel Time (Process a Disk block in memory) < Time (Read next block & fill a buffer) 23 of 80
  • 24. 7-Apr-24 Placing File Records on Disk (1/8) • Records – Describe entities and their attributes/fields • Data Types – Types of values & memory (Bytes) a field/attribute need – Usually one of the standard data types used in programming – Handling large unstructured objects (like BLOB) • Stored separately from its record in a pool of disk blocks & a pointer to the BLOB is included in the record • Record Type or Record format definition – A collection of field names and their corresponding data types (same as prog. langs.) Christalin Nelson | SOCS 24 of 80
  • 25. 7-Apr-24 Placing File Records on Disk (2/8) • File – Sequence of records – Mostly all records in a file are of the same record type • File can have fixed-length records (same size in Bytes) or variable-length records • Why variable-length records? – Records with same type can have variable-length fields – Records with same type can have repeating fields (Fields that take multiple values for individual records). Group of values for the field is called a repeating group. – Records with same type can have optional fields – Records with different types (Mixed files of different sizes). i.e. Placing together the records of different types on disk blocks Christalin Nelson | SOCS 25 of 80
  • 26. 7-Apr-24 Placing File Records on Disk (3/8) • Few Record Storage Formats – Fixed-length record – Variable-length record with variable length fields – Variable-length record with optional fields Christalin Nelson | SOCS Can have <field-type, field-value> pairs 26 of 80
  • 27. 7-Apr-24 Placing File Records on Disk (4/8) • Record Blocking – Records of a file must be allocated to disk blocks (unit of data transfer between disk and memory) – If Block size (B bytes) > Record size (R bytes) • Each block will contain numerous records • Note: Some files may have usually large records that cannot fit in one block – For a file with 'r' records, each record of size 'R‘, and Block size (B) • Average no. of records per Block = Blocking factor (bfr) = Floor value of (B/R) • Unused space (in Bytes) = B - (bfr * R) • No. of Blocks (b) needed for a File = Ceiling value of (r/bfr) Christalin Nelson | SOCS 27 of 80
  • 28. 7-Apr-24 Placing File Records on Disk (5/8) • Spanned Records – Utilization of unused space if Records in a file can span more than one Block • i.e. Store part of a Record on one Block & the rest on another • A pointer at the end of 1st Block points to the Block containing the remainder of the Record (if it is not the next consecutive block on disk) • Unspanned records – Used with fixed-length Records (B > R) that cannot cross Block boundaries – Simplifies record processing because each Record start at a known location in Block • Note: – Either spanned or unspanned organization can be used for variable-length Records • Prefer spanning organization if average Record size is large (reduce wastage of space) Christalin Nelson | SOCS 28 of 80
  • 29. 7-Apr-24 Placing File Records on Disk (6/8) • Spanned and Unspanned Record organization Christalin Nelson | SOCS 29 of 80
  • 30. 7-Apr-24 Placing File Records on Disk (7/8) • Allocating file blocks of disk – Contiguous allocation • File blocks are allocated to consecutive disk blocks • Reading whole file is very fast using double buffering, expanding the file is difficult – Linked allocation • Each file block contains a pointer to the next file block • Easy to expand the file, Slow to read whole file – Combination of Contiguous & Linked allocation • Allocate clusters of consecutive disk blocks, and the clusters are linked • Clusters are sometimes called file segments (or) extents – Indexed allocation • One or more index blocks contain pointers to the actual file blocks Christalin Nelson | SOCS 30 of 80
  • 31. 7-Apr-24 Placing File Records on Disk (8/8) • File Header or File descriptor – Contains file information to determine disk addresses of the file Blocks as well as to record format descriptions • Fixed-length unspanned records: field lengths, order of fields within a record • Variable-length records: field type codes, separator characters, record type codes – Needed by system programs that access (or) search desired record(s) within main memory buffers Christalin Nelson | SOCS 31 of 80
  • 32. 7-Apr-24 File Operations (1/6) • Actual operations can vary from system to system • User programs use these operations to access records through program variables • Classification-1 – Retrieval operations – Update operations • Classification-2 – Record-at-a-time operations • Each operation applies to a single record • Example: Reset, Find, FindNext, Read, Delete, Modify, Insert, Scan – Set-at-a-time higher-level operations • Each operation applies to a set of records • Example: FindAll, Find n, FindOrdered, Reorganize Christalin Nelson | SOCS 32 of 80
  • 33. 7-Apr-24 File Operations (2/6) • Open – Prepare file for R/W  Allocate required buffers  Retrieve file header  Set file pointer to BOF • Reset – Set file pointer of an open file to BOF • Close – Complete file access by releasing buffers  Perform any other needed cleanup operations Christalin Nelson | SOCS 33 of 80
  • 34. 7-Apr-24 File Operations (3/6) • Find (or Locate) – Search for 1st record that satisfies a search condition  Transfer block into a main memory buffer  File pointer points to record in buffer (current record) • Read (or Get) – Copy current record in buffer to a program variable in user program  Can advance current record pointer to next record in file (may read next file block from disk) • FindNext – Search for next record that satisfies a search condition  Transfer block into a main memory buffer  File pointer points to record in buffer (current record) – Types of FindNext available in legacy DBMSs based on hierarchical & network models • FindNext record within a current parent record, given type, meets a complex condition Christalin Nelson | SOCS 34 of 80
  • 35. 7-Apr-24 File Operations (4/6) • Scan – Streamlines Find, FindNext, and Read into a single operation – If file was just opened/reset: Return 1st record, else - return next record – If a condition is specified with the operation: Return record (1st or next) that satisfies the condition • Delete – Delete current record  Update file on disk • Modify – Modify some field values in current record  Update file on disk • Insert – Locate Block where the Record is to be inserted  Transfer Block into a main memory buffer  Write Record into buffer  Write from buffer to disk Christalin Nelson | SOCS 35 of 80
  • 36. 7-Apr-24 File Operations (5/6) • FindAll – Locate all Records in file that satisfy a search condition • Find (or Locate) n – Search for 1st record that satisfies a search condition  Locate next (n-1) records satisfying same condition  Transfer blocks containing n records to main memory buffer • FindOrdered – Retrieve all Records in File in a specified order • Reorganize – Start reorganization process – Example: Reorder file records by sorting them on a specified field Christalin Nelson | SOCS 36 of 80
  • 37. 7-Apr-24 File Operations (6/6) • File organization vs. Access method – File organization • Organization of data of a file into records, blocks, and access structures • Includes the way records and blocks are placed on the storage medium and interlinked. • Performs the frequently applied file operations efficiently – Access method • Provides a group of operations that can be applied to a file • Several access methods can be applied to a file organization • Some access methods can be applied only to files organized in certain ways – Example: Indexed access method cannot be applied to a file without an index Christalin Nelson | SOCS 37 of 80
  • 38. 7-Apr-24 Heap Files (1/4) • Pile File (or) Sequential File • Simplest and most basic type of organization • Records are placed in the file in insertion order – New records are inserted at EOF • Used with additional access paths (like secondary indexes) • Used to collect and store data records for future use Christalin Nelson | SOCS 38 of 80
  • 39. 7-Apr-24 Heap Files (2/4) • Insertion – Very efficient – Address of last file block is kept in File header  Copy last disk block of file into a buffer  Add new record  Rewrite block back to disk • Search – Perform linear search through the file block by block • Complexity: For a file of ‘b’ blocks – Average: Search (b/2) blocks if only one record satisfies search condition – Worst: Read and search all b blocks in the file • Modify – Variable-length records require deleting old record  Insert modified record – The modified record may not fit in its old space on disk Christalin Nelson | SOCS 39 of 80
  • 40. 7-Apr-24 Heap Files (3/4) • Delete – Find Block  Copy Block into a buffer  Delete Record from buffer  Rewrite back to disk – Results in wasted storage space • Solution – Store an extra byte or bit (deletion marker) with each record  Value of marker identifies if it is invalid deleted record or valid undeleted record – Reclaiming unused space of deleted records » Periodic reorganization of file – Records are packed by removing deleted records • Use spanned/unspanned organization for unordered files with fixed/variable -length records » Use space of deleted records for inserting new records • Requires extra bookkeeping to keep track of empty locations Christalin Nelson | SOCS 40 of 80
  • 41. 7-Apr-24 Heap Files (4/4) • Read – In-order read: Create sorted copy of file wrt values of some field • Sorting is expensive for large disk files. Special techniques for external sorting can be used – Relative or Direct file • A file of unordered fixed-length records with unspanned blocks and contiguous allocation, access any record directly by their relative position in the file – To find ith record in a file having ‘r’ records, and records in each block are numbered 1, 2, ..., bfr » (i % bfr)th record in ⎡(i/bfr)⎤ th block Christalin Nelson | SOCS 41 of 80
  • 42. 7-Apr-24 Sorted Files (1/5) • Based on ordering field (or) ordering key – Leads to an ordered or sequential file – Ordering key: Ordering field is also a key field of the file • Advantages – Read is extremely efficient – Search conditions with >, <, ≥, and ≤ on ordering field is quite efficient • Search conditions based on ordering key results in faster access with binary search technique – For a File with ‘b’ blocks, binary search accesses log2 (b) blocks – Minimal seek time » Ordered files are put in blocks and stored on contiguous cylinders » A binary search for disk files can be done on the blocks rather than on the records • Searching next record from current record in order of ordering key requires no additional block accesses (current record is not the last one in the block & next record is in the same block) Christalin Nelson | SOCS 42 of 80
  • 43. 7-Apr-24 Sorted Files (2/5) • Binary Search Algorithm on Ordering key of a Disk File Christalin Nelson | SOCS 43 of 80
  • 44. 7-Apr-24 Sorted Files (3/5) • Note – Random access based on a non-ordering field uses linear search – Ordered access based on a non-ordering field requires creation of another sorted copy • Insert/delete operations are expensive as records must remain physically ordered – Solution • (1) Keep some unused space in each block for new records (Temporary) • (2) Create a temporary unordered file (Overflow or Transaction file). The actual ordered file is called the Main or Master file. – Insert new record at end of Overflow file  During file reorganization, periodically sort & merge Overflow file with Master file » The records marked for deletion are removed during the reorganization – Before reorganization, a Linear search is done on Overflow file if Binary search on Main file is not successful Christalin Nelson | SOCS 44 of 80
  • 45. 7-Apr-24 Sorted Files (4/5) • Modification of the field value of a record depends on – Field to be modified • Modification of ordering field – Requires deletion of old record  Insertion of modified record – Record can change its position in the file • Modification of non-ordering field – Change record (fixed-length)  Rewrite it in same physical location on disk – Search condition to locate the record Christalin Nelson | SOCS 45 of 80
  • 46. 7-Apr-24 Sorted Files (5/5) • Indexed-sequential file & Clustered file – Ordered files based on ordering key used in DB applications with a primary index • Improves random access time on ordering key field – If the ordering attribute is not a key, the file is called a clustered file Christalin Nelson | SOCS 46 of 80
  • 47. 7-Apr-24 Hashing Techniques (1/2) • Hash file (or) Direct file – Primary file organization (based on hashing) that provides very fast access to records • It needs only a single-block access • Hash field – Hash key: Hash field is a key field of the file – Note: An equality condition on hash field is used as Search condition • Hash function (or) randomizing function (h) – Yields a disk block address corresponding to hash field value of a record  Store the record in this block address • Hashing is also used as an internal search structure within a program whenever a group of records is accessed exclusively by using the value of one field Christalin Nelson | SOCS 47 of 80
  • 48. 7-Apr-24 Hashing Techniques (2/2) • Techniques – Internal Hashing • All data items are stored directly within the hash table's buckets or slots – External Hashing • Hash table's buckets or slots contain pointers or references to external storage locations where items are stored – Hashing Techniques That Allow Dynamic File Expansion • Extendible Hashing, Linear Hashing, Dynamic Hashing Christalin Nelson | SOCS 48 of 80
  • 49. 7-Apr-24 Internal Hashing (1/12) • For internal files, hashing is typically implemented as a hash table – Consider an array of records of size M – A suitable hash function transforms hash field value into array index 0 to M-1 that corresponds to M slots • h(Key) = Key % M, which returns record address (array index) – Key can be Integer or String (needs transformation) Christalin Nelson | SOCS 49 of 80
  • 50. 7-Apr-24 Internal Hashing (2/12) • Example-1: Find hash code of the key “AKEY” if size of table is 5, and the Hash function is h(key) = key % m. – If each character can be represented by 5 digits, the key is treated as a base-32 (i.e. 25) number. – Consider the decimal representation of each alphabet: A = 1, B = 2, ……., Z = 26 – Key transformation: key = (1x323) + (11x322) + (5x321) + (25x320) = 44271 – Find index: h(44271) = 44271 % 5 = 1 – Hence the given key “AKEY” corresponds to the index [1] of the hash table Christalin Nelson | SOCS 50 of 80
  • 51. 7-Apr-24 Internal Hashing (3/12) • Few more hash functions – (1) Non-integer hash field values can be transformed into integers before the mod function is applied. For character strings, the numeric (ASCII) codes associated with characters can be used in the transformation (i.e. Multiply the code values) – (2) Folding: Apply an arithmetic function (like addition) or a logical function (like XOR) to different portions of the hash field value to calculate the hash address • Example: If size of hash table (M) is 1000 and Hash key is a 6-digit integer. The key 235469 can be stored at the address: (235+964) mod 1000 = 199. – (3) Pick some digits of hash field value (3rd, 5th, and 8th digits) to form the hash address • Example: If size of hash table (M) is 1000 and Hash key is a 10-digit SSN. The SSN 3016789234 gives a hash value of 172. Christalin Nelson | SOCS 51 of 80
  • 52. 7-Apr-24 Internal Hashing (4/12) • Properties of a Good Hash Function – Makes use of all information provided by the key – Uniformly distribute records across the hash table [reduced length of linked lists]. M is chosen as prime no. – Maps similar keys to very different hash values – Uses very fast operations • Problem with most hashing functions – It does not guarantee that distinct values will hash to distinct addresses because hash field space (No. of possible values a hash field can take) > Address space (No. of available addresses for records) Christalin Nelson | SOCS 52 of 80
  • 53. 7-Apr-24 Internal Hashing (5/12) • Algorithm for h(k) = k mod M • Load Factor (λ) = N/M, where N – No. of record to be stored, M – Hash table size • Collision – Occurs when hash field value of a record that is being inserted hashes to an address that already contains a different record • Collision Resolution – Process of finding another position that does not lead to collision (Hash address is not occupied) Christalin Nelson | SOCS 53 of 80
  • 54. 7-Apr-24 Internal Hashing (6/12) • Collision Resolution Techniques – (1) Open addressing (e.g. Linear probing, Quadratic Probing) – (2) Chaining • Various overflow locations are kept, usually by extending the array with several overflow positions. Additionally, a pointer field is added to each record location • Collision is resolved by placing new record in an unused overflow location  Set pointer of occupied hash address location = Address of that overflow location • A linked list of overflow records for each hash address is thus maintained – (3) Multiple hashing • Example: The program applies second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function. Then, uses open addressing if necessary. Christalin Nelson | SOCS 54 of 80
  • 55. 7-Apr-24 Internal Hashing (7/12) • Example (Linear Probing) – Construct a Hash table for given keys [A, S, E, A, R, C, H, I, N, G, E, X, A, M, P, L, E] using Linear Probing. Assume M=19. – Algorithm for simple hash function, h(k) = k mod M Christalin Nelson | SOCS Key A S E A R C H I N G E X A M P L E Hash Value 1 0 5 1 18 3 8 9 14 7 5 5 1 13 16 12 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 S A A C A E E G H I X E L M N P R 55 of 80
  • 56. 7-Apr-24 Internal Hashing (8/12) • Open Addressing – Disadvantage of Open Addressing • Primary clustering • Linear probing runs slowly for nearly full tables – Note: For Open Addressing • Search/Insertion is faster for λ<1 • Good strategy: λ < 0.5 (i.e. M > 2N) • If λ > 0.5, then re-hashing is preferable (with a bigger table) – Algorithm Christalin Nelson | SOCS 56 of 80
  • 57. 7-Apr-24 Internal Hashing (9/12) • Chaining – Chaining technique is preferred when N & M cannot be predicted in advance. The choice of M basically depends on other factors such as available memory. • Typically M is chosen to be relatively small so as not to use up a large area of contiguous memory, but large enough to have short lists for more efficient sequential search • In general, M can vary from 0.1N to N. i.e. λ = 1 to 10. – This method is useful for highly dynamic situations Christalin Nelson | SOCS 57 of 80
  • 58. 7-Apr-24 Internal Hashing (10/12) • Example: Chaining Christalin Nelson | SOCS 58 of 80
  • 59. 7-Apr-24 Internal Hashing (11/12) • Example: Construct a Hash table for given keys [A, S, E, A, R, C, H, I, N, G, E, X, A, M, P, L, E] using a separate chaining technique. Assume M=11. [Note: Each character is a key] Christalin Nelson | SOCS Key A S E A R C H I N G E X A M P L E Hash Value 1 8 5 1 7 3 8 9 3 7 5 2 1 2 5 1 5 0 1 L A A A 2 M X 3 N C 4 5 E P E E 6 7 G R 8 H S 9 I 10 59 of 80
  • 60. 7-Apr-24 Internal Hashing (12/12) • Each collision resolution method requires its own algorithms for insertion, retrieval, and deletion of records – The algorithms for chaining are the simplest – Deletion algorithms for open addressing are complex • Simulation and analysis – Load factor is 0.7 to 0.9 to reduce number of collisions and avoid wastage of space – Other hash functions may require M to be a power of 2 Christalin Nelson | SOCS 60 of 80
  • 61. 7-Apr-24 External Hashing (1/6) • Hashing for disk files – The target address space is made of buckets (each bucket can hold multiple records) • A bucket is either one disk block or a cluster of contiguous disk blocks – Hashing function • Maps a key into a relative bucket number (not an absolute block address) • A table maintained in the File header converts Bucket number  Corresponding Disk block address Christalin Nelson | SOCS 61 of 80
  • 62. 7-Apr-24 External Hashing (2/6) • Order-preserving hash function – Maintain records in order of hash field values • Example: The leftmost 3 digits of the invoice number field can be used to get bucket address as a hash address & records are also sorted by invoice number within each bucket • Collision – Less severe as many records can hash to the same bucket without causing problems. – When a bucket is filled & a new record being inserted hashes to that bucket  Use a variation of chaining • i.e. Maintain a pointer in each bucket to a linked list of overflow records for the bucket • Pointers in linked list are record pointers (include both block address & a relative record position within the block) Christalin Nelson | SOCS 62 of 80
  • 63. 7-Apr-24 Christalin Nelson | SOCS Handling overflow for buckets by chaining 63 of 80
  • 64. 7-Apr-24 External Hashing (4/6) • Static hashing – Number of buckets (M) is fixed – Not suited for dynamic files • If M – Fixed number of buckets, and ‘n’ – Max. number of records per bucket, No. of records that can fitted = (n *M) – If no. of records < (n * M), we are left with a lot of unused space – If no. of records > (n * M), numerous collisions will occur & retrieval will be slowed down because of the long lists of overflow records – Hence, M has to be changed & a new hash function (based on new value of M) should be used to redistribute the records. These reorganizations can be quite time-consuming for large files. • Dynamic file organizations based on hashing allow M to vary dynamically with only localized reorganization Christalin Nelson | SOCS 64 of 80
  • 65. 7-Apr-24 External Hashing (5/6) • Searching for a record given a value of some field other than the hash field is expensive • Record deletion – Remove the record from its bucket • If bucket has an overflow chain: Move one of the overflow records into the bucket to replace the deleted record • If the record to be deleted is already in overflow: Remove it from Linked list  Keep track of empty positions in overflow by maintaining a linked list of unused overflow locations Christalin Nelson | SOCS 65 of 80
  • 66. 7-Apr-24 External Hashing (6/6) • Modifying a specific record’s field value depends on – (1) The search condition to locate that specific record • If the search condition is an equality comparison on the hash field, record can be located efficiently by using the hashing function; otherwise, a linear search is done. – (2) The field to be modified • Modifying non-hash field: Change the record  Rewrite it in the same bucket • Modifying hash field: Move the record to another bucket (which requires deletion)  Insert the modified record Christalin Nelson | SOCS 66 of 80
  • 67. 7-Apr-24 Hashing Techniques That Allow Dynamic File Expansion • Used to handle hash table overflow & dynamic resizing of hash table as number of stored items increases or decreases • Extendible hashing – Stores an access structure in addition to the file (somewhat similar to indexing) – Vs. Indexing • Access structure is based on the Hash function result on a search field • In indexing, the access structure is based on the values of the search field itself • Dynamic hashing – Uses an access structure based on binary tree data structures. – Hash function result (hash value of a record) is represented as a binary number (string of bits) – The access structure is built on the binary representation of the hash function result – Records are distributed among buckets based on the values of leading bits in their hash values • Linear hashing – Does not require additional access structures Christalin Nelson | SOCS 67 of 80
  • 68. 7-Apr-24 Extendible hashing (1/5) • Maintain a directory – i.e. array of 2d bucket addresses (d – global depth of directory) – Implemented using dynamic array or trie • Operation – Hash function maps keys to hash values (represented as binary strings)  Index to directory (integer value) corresponding to high-order(first) d bits of hash value  Get Bucket Address  Store the corresponding record in bucket • Several directory entries (with same first d’ bits for their hash values) may contain same bucket address if all the records that hash to these locations fit in a single bucket • Local depth (d’) specifies no. of bits on which the bucket contents are based • Most record retrievals require two block accesses – (1) To the directory, and (2) To the bucket Christalin Nelson | SOCS 68 of 80
  • 69. 7-Apr-24 Christalin Nelson | SOCS Structure of Extendible hashing scheme using a directory with global depth d = 3 69 of 80
  • 70. 7-Apr-24 Extendible hashing (3/5) • Doubling – Increase d by one at a time, if a bucket is full (overflow could happen) and d’ = d – Doubles the number of buckets available in the hash table. i.e. Allows additional space to store more items in directory • Each bucket in the existing directory is split into two new buckets  Update the directory to point to new buckets – Reduces the likelihood of collisions and maintains efficient access times • Halving – Decrease d by one at a time, if the hash table becomes sparsely populated and d > d’ – Halves the number of buckets • Pairs of buckets are merged into single buckets  Update directory to point to respective buckets – Ensures that memory is not wasted due to empty or sparsely populated buckets, thus optimizing memory usage and maintaining efficient access times Christalin Nelson | SOCS 70 of 80
  • 71. 7-Apr-24 Extendible hashing (4/5) • Bucket splitting – When a collision happens, the affected bucket is split into two new buckets, to increase the capacity of the hash table & accommodate additional records • i.e. If the affected bucket’s hash value starts with 01  Split bucket into two buckets with hash values starting with 010 and 011  Update directory to point to respective buckets – Local depth (d’) of the two new buckets becomes one more than the old bucket • Continued Bucket splitting can indeed lead to Doubling – While bucket splitting initially increases the number of buckets (and d’) in the hash table without changing the directory size (d) – Continued bucket splitting operations (even after d’ = d) will eventually necessitate directory expansion (doubling) to accommodate the growing number of buckets Christalin Nelson | SOCS 71 of 80
  • 72. 7-Apr-24 Extendible hashing (5/5) • Advantage – Performance of the file does not degrade as the file grows • Vs. Static external hashing: As collisions increase, chaining increases avg. accesses per key – No space is allocated for future growth. Additional buckets can be allocated dynamically as needed. The space overhead for the directory table is negligible. • The maximum directory size is 2k, where k – No. of bits in the hash value – Splitting causes minor reorganization in most cases – Reorganization due to Doubling/Halving is more expensive • Disadvantage – Requires 2 block accesses (vs. one in static hashing) Christalin Nelson | SOCS 72 of 80
  • 73. 7-Apr-24 Dynamic Hashing (1/3) • Serves as a foundational concept in the development of extendible hashing and other dynamic hash table techniques • Features – Overflow Handling • Handled hash collisions and overflow by employing techniques such as overflow buckets or linked lists within buckets to accommodate additional keys – Bucket Splitting and Merging • Buckets could be split or merged to redistribute keys and maintain a balanced load factor • Typically done at the bucket level and did not involve resizing the directory – Directory Structure • Typically static and did not support dynamic resizing • Consists of a fixed-size array of pointers or references to buckets Christalin Nelson | SOCS 73 of 80
  • 74. 7-Apr-24 Dynamic Hashing (2/3) • Bucket Addresses – Either n high-order bits (or) n-1 high-order bits – Depends on total number of keys belonging to the respective bucket • The major difference is in the organization of the directory – (Extendible hashing) Uses global depth (high-order d bits) for the flat directory & combines adjacent collapsible buckets into a bucket of local depth d’ – (Dynamic hashing) maintains a tree-structured directory with 2 node types • Internal Directory nodes that have two pointers – Left pointer corresponds to bit 0 (in the hashed address) – Right pointer corresponding to bit 1 • Leaf Directory nodes – Hold a pointer to the actual bucket with records Christalin Nelson | SOCS 74 of 80
  • 75. 7-Apr-24 Christalin Nelson | SOCS Structure of the dynamic hashing scheme 75 of 80
  • 76. 7-Apr-24 Linear Hashing (1/3) • Hash file can dynamically expand/shrink its no. of buckets without a directory • Operation – File starts with M buckets numbered 0, 1, ..., M-1 & uses initial hash function hi(K) = K mod M – A value 'n' is used to find the buckets that have split. • Initially set n=0. Increment n linearly by 1 whenever a delayed split occurs – Maintain individual overflow chains for each bucket • When collision leads to an overflow record in any file bucket  Split Bucket-0 into two buckets: Original bucket (0) & New bucket (M) at EOF  Distribute records between two buckets based on a different hashing function hi+1(K) = K mod 2M  Linear Increment n – Note: Any record that hashed to bucket-0 based on hi will hash to either bucket-0 or bucket-M based on hi+1 – Retrieve record with hash key value K: First apply hi to K  If hi(K) < n, apply hi+1 on K because the bucket is already split Christalin Nelson | SOCS 76 of 80
  • 77. 7-Apr-24 Linear Hashing (2/3) • Operation (contd.) – If enough overflows occur due to collision, all Original file buckets 0, 1, ..., M-1 will be split in linear order and n=M. The file now has 2M buckets that use hi+1 – At this point, reset n=0. Any new collisions that cause overflow lead to use of the new hashing function hi+2(K) = K mod 4M. • File load factor (l) = r/(bfr * N) – Where • r – Current number of file records • bfr – Max. no. of records that can fit in a bucket • N – Current no. of file buckets Christalin Nelson | SOCS 77 of 80
  • 78. 7-Apr-24 Linear Hashing (3/3) • Splitting & Merging can be controlled by monitoring load factor (l) – Splits can be triggered when ‘l’ exceeds a certain threshold (say, 0.9) – Combinations can be triggered when ‘l’ falls below another threshold (say, 0.7) • Blocks are combined linearly, and N is decremented appropriately • Linear hashing maintains load factor fairly constantly while file grows/shrinks Christalin Nelson | SOCS 78 of 80
  • 79. 7-Apr-24 Parallelizing disk access with RAID • Refer Slides 88-133 from the Presentation titled “Storage System Architecture” available in Slideshare Christalin Nelson | SOCS 79 of 80
  • 80. Thank You Christalin Nelson | SOCS 7-Apr-24 7-Apr-24 80 of 80