SlideShare a Scribd company logo
1 of 19
Download to read offline
Introduction to File Structures
Introduction to File Structures
1
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
Introduction to File Organization
Introduction to File Organization
Data processing from a computer science perspective:
– Storage of data
– Organization of data
– Access to data
This will be built on your knowledge of
Data Structures
Data Structures
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
Data Structures vs. File Structures
Data Structures vs. File Structures
Both involve:
Representation of Data
+
Operations for accessing data
Difference:
– Data Structures deal with data in main memory
main memory
– File Structures deal with data in secondary storage
secondary storage
device
device (File).
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
Computer Architecture
Computer Architecture
CPU
Main Memory
Second Storage
Registers
Cache
Differences
Differences
 Fast
 Small
 Expensive
 Volatile
 Slow
 Large
 Cheap
 Stable
RAM
(Semiconductor)
Disk, Tape,
DVD-R
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
Memory Hierarchy
Memory Hierarchy
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
Memory Hierarchy
Memory Hierarchy
On systems with 32-bit addressing, only 232 bytes can be
directly referenced in main memory.
The number of data objects may exceed this number!
Data must be maintained across program executions. This
requires storage devices that retain information when the
computer is restarted.
– We call such storage nonvolatile.
– Primary storage is usually volatile, whereas secondary and
tertiary storage are nonvolatile.
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
Definition
Definition
 File Structures is the Organization of Data in Secondary
Storage Device in such a way that minimize the access
time and the storage space.
 A File Structure is a combination of representations for
data in files and of operations for accessing the data.
 A File Structure allows applications to read, write and
modify data. It might also support finding the data that
matches some search criteria or reading through the data
in some particular order.
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
Why Study File Structure Design?
I. Data Storage
Why Study File Structure Design?
I. Data Storage
Computer Data can be stored in three kinds of
locations:
– Primary Storage == Memory [Computer
Memory]
– Secondary Storage [Online Disk/ Tape/ CDRom
that can be accessed by the computer]
– Tertiary Storage == Archival Data [Offline
Disk/Tape/ CDRom not directly available to the
computer.]
Our
Focus
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
II. Memory versus Secondary Storage
II. Memory versus Secondary Storage
 Secondary storage such as disks can pack thousands of
megabytes in a small physical location.
 Computer Memory (RAM) is limited.
 However, relative to Memory, access to secondary
storage is extremely slow [E.g., getting information
from slow RAM takes 120. 10-9 seconds (= 120
nanoseconds) while getting information from Disk
takes 30. 10-3 seconds (= 30 milliseconds)]
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
III. How Can Secondary Storage Access Time be Improved?
III. How Can Secondary Storage Access Time be Improved?
By improving the File Structure.
Since the details of the representation of the data and the
implementation of the operations determine the efficiency
of the file structure for particular applications, improving
these details can help improve secondary storage access
time.
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
Overview of File Structure Design
I. General Goals
Overview of File Structure Design
I. General Goals
 Get the information we need with one access to the disk.
 If that’s not possible, then get the information with as few
accesses as possible.
 Group information so that we are likely to get everything we need
with only one trip to the disk.
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
II. Fixed versus Dynamic Files
II. Fixed versus Dynamic Files
 It is relatively easy to come up with file structure designs
that meet the general goals when the files never change.
 When files grow or shrink when information is added
and deleted, it is much more difficult.
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
II. The emergence of Disks and Indexes
II. The emergence of Disks and Indexes
 As files grew very large, unaided sequential access was not a good
solution.
 Disks allowed for direct access.
 Indexes made it possible to keep a list of keys and pointers in a
small file that could be searched very quickly.
 With the key and pointer, the user had direct access to the large,
primary file.
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
How Fast?
How Fast?
Typical times for getting info
– Main memory: ~120 nanoseconds =
– Magnetic Disks: ~30 milliseconds =
An analogy keeping same time proportion as above
– Looking at the index of a book: 20 seconds
versus
– Going to the library: 58 days
9
120 10
u
6
30 10
u
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
Comparison
Comparison
Main Memory
– Fast (since electronic)
– Small (since expensive)
– Volatile (information is lost when power failure occurs)
Secondary Storage
– Slow (since electronic and mechanical)
– Large (since cheap)
– Stable, persistent (information is preserved longer)
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
Goal of the Course
Goal of the Course
Minimize number of trips to the disk in order to get
desired information. Ideally get what we need in one disk
access or get it with as few disk access as possible.
Grouping related information so that we are likely to get
everything we need with only one trip to the disk (e.g.
name, address, phone number, account balance).
Locality of Reference
Locality of Reference in Time
Time and Space
Space
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
Good File Structure Design
Good File Structure Design
Fast access to great capacity
Reduce the number of disk accesses
By collecting data into buffers, blocks or buckets
Manage growth by splitting these collections
CIS 256 (File Structures)
Introduction
to
File
Structures
1
History of File Structure Design
History of File Structure Design
1. In the beginning… it was the tape
–
– Sequential access
Sequential access
– Access cost proportional to size of file
[Analogy to sequential access to array data structure]
2. Disks became more common
–
– Direct access
Direct access
[Analogy to access to position in array]
–
– Indexes
Indexes were invented
• list of keys and points stored in small file
• allows direct access to a large primary file
Great if index fits into main memory.
As file grows we have the same problem we had with a
large primary file
CIS 256 (File Structures) 
Introduction
to
File
Structures
1
History of File Structure Design
History of File Structure Design
3. Tree structures emerged for main memory (1960`s)
–
– Binary search trees (
Binary search trees (BST`s
BST`s)
)
–
– Balanced
Balanced, self adjusting BST`s: e.g. AVL trees (1963)
4. A tree structure suitable for files was invented:
B trees
B trees (1979) and B+ trees
B+ trees
good for accessing millions of records with 3 or 4 disk
accesses.
5. What about getting info with a single request?
– Hashing Tables (Theory developed over 60’s and 70’s but still
a research topic)
good when files do not change too much in time.
– Expandable, dynamic hashing (late 70’s and 80’s)
one or two disk accesses even if file grows dramatically

More Related Content

Similar to Chapter 1_ Introduction to File Structures.pdf

storage techniques_overview-1.pptx
storage techniques_overview-1.pptxstorage techniques_overview-1.pptx
storage techniques_overview-1.pptx20CS102RAMMPRASHATHK
 
presentations
presentationspresentations
presentationsMISY
 
Ie Storage, Multimedia And File Organization
Ie   Storage, Multimedia And File OrganizationIe   Storage, Multimedia And File Organization
Ie Storage, Multimedia And File OrganizationMISY
 
Distributed File System
Distributed File SystemDistributed File System
Distributed File SystemNtu
 
Ch11 file system implementation
Ch11   file system implementationCh11   file system implementation
Ch11 file system implementationWelly Dian Astika
 
Unit 3 file management
Unit 3 file managementUnit 3 file management
Unit 3 file managementKalai Selvi
 
Lecture 14,15 and 16 file systems
Lecture 14,15 and 16  file systemsLecture 14,15 and 16  file systems
Lecture 14,15 and 16 file systemsRushdi Shams
 
Unit 3 chapter 1-file management
Unit 3 chapter 1-file managementUnit 3 chapter 1-file management
Unit 3 chapter 1-file managementKalai Selvi
 
BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?
BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year? BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?
BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year? panagenda
 
overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam pratikkadam78
 
Unix file systems 2 in unix internal systems
Unix file systems 2 in unix internal systems Unix file systems 2 in unix internal systems
Unix file systems 2 in unix internal systems senthilamul
 
Graphical user interface _ SEOSKILLS Hyderabad
Graphical user interface _ SEOSKILLS HyderabadGraphical user interface _ SEOSKILLS Hyderabad
Graphical user interface _ SEOSKILLS HyderabadSEO SKills
 

Similar to Chapter 1_ Introduction to File Structures.pdf (20)

file management
 file management file management
file management
 
storage techniques_overview-1.pptx
storage techniques_overview-1.pptxstorage techniques_overview-1.pptx
storage techniques_overview-1.pptx
 
File Allocation Methods.ppt
File Allocation Methods.pptFile Allocation Methods.ppt
File Allocation Methods.ppt
 
presentations
presentationspresentations
presentations
 
Ie Storage, Multimedia And File Organization
Ie   Storage, Multimedia And File OrganizationIe   Storage, Multimedia And File Organization
Ie Storage, Multimedia And File Organization
 
Distributed File System
Distributed File SystemDistributed File System
Distributed File System
 
File management
File managementFile management
File management
 
Ch11 file system implementation
Ch11   file system implementationCh11   file system implementation
Ch11 file system implementation
 
Unit 3 file management
Unit 3 file managementUnit 3 file management
Unit 3 file management
 
File Carving
File CarvingFile Carving
File Carving
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Windowsforensics
WindowsforensicsWindowsforensics
Windowsforensics
 
File
FileFile
File
 
Aaaaaaaaaa
AaaaaaaaaaAaaaaaaaaa
Aaaaaaaaaa
 
Lecture 14,15 and 16 file systems
Lecture 14,15 and 16  file systemsLecture 14,15 and 16  file systems
Lecture 14,15 and 16 file systems
 
Unit 3 chapter 1-file management
Unit 3 chapter 1-file managementUnit 3 chapter 1-file management
Unit 3 chapter 1-file management
 
BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?
BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year? BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?
BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?
 
overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam
 
Unix file systems 2 in unix internal systems
Unix file systems 2 in unix internal systems Unix file systems 2 in unix internal systems
Unix file systems 2 in unix internal systems
 
Graphical user interface _ SEOSKILLS Hyderabad
Graphical user interface _ SEOSKILLS HyderabadGraphical user interface _ SEOSKILLS Hyderabad
Graphical user interface _ SEOSKILLS Hyderabad
 

Recently uploaded

Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptAfnanAhmad53
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...ppkakm
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxkalpana413121
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 

Recently uploaded (20)

Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .ppt
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 

Chapter 1_ Introduction to File Structures.pdf

  • 1. Introduction to File Structures Introduction to File Structures 1
  • 2. CIS 256 (File Structures) Introduction to File Structures 1 Introduction to File Organization Introduction to File Organization Data processing from a computer science perspective: – Storage of data – Organization of data – Access to data This will be built on your knowledge of Data Structures Data Structures
  • 3. CIS 256 (File Structures) Introduction to File Structures 1 Data Structures vs. File Structures Data Structures vs. File Structures Both involve: Representation of Data + Operations for accessing data Difference: – Data Structures deal with data in main memory main memory – File Structures deal with data in secondary storage secondary storage device device (File).
  • 4. CIS 256 (File Structures) Introduction to File Structures 1 Computer Architecture Computer Architecture CPU Main Memory Second Storage Registers Cache Differences Differences Fast Small Expensive Volatile Slow Large Cheap Stable RAM (Semiconductor) Disk, Tape, DVD-R
  • 5. CIS 256 (File Structures) Introduction to File Structures 1 Memory Hierarchy Memory Hierarchy
  • 6. CIS 256 (File Structures) Introduction to File Structures 1 Memory Hierarchy Memory Hierarchy On systems with 32-bit addressing, only 232 bytes can be directly referenced in main memory. The number of data objects may exceed this number! Data must be maintained across program executions. This requires storage devices that retain information when the computer is restarted. – We call such storage nonvolatile. – Primary storage is usually volatile, whereas secondary and tertiary storage are nonvolatile.
  • 7. CIS 256 (File Structures) Introduction to File Structures 1 Definition Definition File Structures is the Organization of Data in Secondary Storage Device in such a way that minimize the access time and the storage space. A File Structure is a combination of representations for data in files and of operations for accessing the data. A File Structure allows applications to read, write and modify data. It might also support finding the data that matches some search criteria or reading through the data in some particular order.
  • 8. CIS 256 (File Structures) Introduction to File Structures 1 Why Study File Structure Design? I. Data Storage Why Study File Structure Design? I. Data Storage Computer Data can be stored in three kinds of locations: – Primary Storage == Memory [Computer Memory] – Secondary Storage [Online Disk/ Tape/ CDRom that can be accessed by the computer] – Tertiary Storage == Archival Data [Offline Disk/Tape/ CDRom not directly available to the computer.] Our Focus
  • 9. CIS 256 (File Structures) Introduction to File Structures 1 II. Memory versus Secondary Storage II. Memory versus Secondary Storage Secondary storage such as disks can pack thousands of megabytes in a small physical location. Computer Memory (RAM) is limited. However, relative to Memory, access to secondary storage is extremely slow [E.g., getting information from slow RAM takes 120. 10-9 seconds (= 120 nanoseconds) while getting information from Disk takes 30. 10-3 seconds (= 30 milliseconds)]
  • 10. CIS 256 (File Structures) Introduction to File Structures 1 III. How Can Secondary Storage Access Time be Improved? III. How Can Secondary Storage Access Time be Improved? By improving the File Structure. Since the details of the representation of the data and the implementation of the operations determine the efficiency of the file structure for particular applications, improving these details can help improve secondary storage access time.
  • 11. CIS 256 (File Structures) Introduction to File Structures 1 Overview of File Structure Design I. General Goals Overview of File Structure Design I. General Goals Get the information we need with one access to the disk. If that’s not possible, then get the information with as few accesses as possible. Group information so that we are likely to get everything we need with only one trip to the disk.
  • 12. CIS 256 (File Structures) Introduction to File Structures 1 II. Fixed versus Dynamic Files II. Fixed versus Dynamic Files It is relatively easy to come up with file structure designs that meet the general goals when the files never change. When files grow or shrink when information is added and deleted, it is much more difficult.
  • 13. CIS 256 (File Structures) Introduction to File Structures 1 II. The emergence of Disks and Indexes II. The emergence of Disks and Indexes As files grew very large, unaided sequential access was not a good solution. Disks allowed for direct access. Indexes made it possible to keep a list of keys and pointers in a small file that could be searched very quickly. With the key and pointer, the user had direct access to the large, primary file.
  • 14. CIS 256 (File Structures) Introduction to File Structures 1 How Fast? How Fast? Typical times for getting info – Main memory: ~120 nanoseconds = – Magnetic Disks: ~30 milliseconds = An analogy keeping same time proportion as above – Looking at the index of a book: 20 seconds versus – Going to the library: 58 days 9 120 10 u 6 30 10 u
  • 15. CIS 256 (File Structures) Introduction to File Structures 1 Comparison Comparison Main Memory – Fast (since electronic) – Small (since expensive) – Volatile (information is lost when power failure occurs) Secondary Storage – Slow (since electronic and mechanical) – Large (since cheap) – Stable, persistent (information is preserved longer)
  • 16. CIS 256 (File Structures) Introduction to File Structures 1 Goal of the Course Goal of the Course Minimize number of trips to the disk in order to get desired information. Ideally get what we need in one disk access or get it with as few disk access as possible. Grouping related information so that we are likely to get everything we need with only one trip to the disk (e.g. name, address, phone number, account balance). Locality of Reference Locality of Reference in Time Time and Space Space
  • 17. CIS 256 (File Structures) Introduction to File Structures 1 Good File Structure Design Good File Structure Design Fast access to great capacity Reduce the number of disk accesses By collecting data into buffers, blocks or buckets Manage growth by splitting these collections
  • 18. CIS 256 (File Structures) Introduction to File Structures 1 History of File Structure Design History of File Structure Design 1. In the beginning… it was the tape – – Sequential access Sequential access – Access cost proportional to size of file [Analogy to sequential access to array data structure] 2. Disks became more common – – Direct access Direct access [Analogy to access to position in array] – – Indexes Indexes were invented • list of keys and points stored in small file • allows direct access to a large primary file Great if index fits into main memory. As file grows we have the same problem we had with a large primary file
  • 19. CIS 256 (File Structures) Introduction to File Structures 1 History of File Structure Design History of File Structure Design 3. Tree structures emerged for main memory (1960`s) – – Binary search trees ( Binary search trees (BST`s BST`s) ) – – Balanced Balanced, self adjusting BST`s: e.g. AVL trees (1963) 4. A tree structure suitable for files was invented: B trees B trees (1979) and B+ trees B+ trees good for accessing millions of records with 3 or 4 disk accesses. 5. What about getting info with a single request? – Hashing Tables (Theory developed over 60’s and 70’s but still a research topic) good when files do not change too much in time. – Expandable, dynamic hashing (late 70’s and 80’s) one or two disk accesses even if file grows dramatically