SlideShare a Scribd company logo
Physical Design Considerations
File Organization Techniques
Record Access Methods
Data Structures
Physical Design
Provide good performance
– Fast response time
– Minimum disk accesses
Disk Assembly
Cylinder
Track
Page or
Sector
Access time = seek time + rotational delay + transfer time
Definitions
Seek time
– Average time to move the read-write head
Rotational delay
– Average time for the sector to move under the
read-write head
Transfer
– Time to read a sector and transfer the data to
memory
More Definitions
Logical Record
– The data about an entity (a row in a table)
Physical Record
– A sector, page or block on the storage medium
Typically several logical records can be
stored in one physical record.
File Organization Techniques
Three techniques
– Heap (unordered)
– Sorted
Sequential (SAM)
Indexed Sequential (ISAM)
– Hashed or Direct
Heap File
ID Company Industry Symbl. Price Earns. Dividnd.
1767 Tony Lama Apparel TONY 45.00 1.50 0.25
1152 Lockheed Aero LCH 112.00 1.25 0.50
1175 Ford Auto F 88.00 1.70 0.20
1122 Exxon Oil XON 46.00 2.50 0.75
1231 Intel Comp. INTL 30.00 2.00 0.00
1323 GM Auto GM 158.00 2.10 0.30
1378 Texaco Oil TX 230.00 2.80 1.00
1245 Digital Comp. DEC 120.00 1.80 0.10
Tony Lama was the first record added,
Digital was the last.
Heap File Characteristics
Insertion
– Fast: New records added at the end of the file
Retrieval
– Slow: A sequential search is required
Update - Delete
– Slow:
Sequential search to find the page
Make the update or mark for deletion
Re-write the page
Sequential (Ordered) File
ID Company Industry Symbl. Price Earns. Dividnd.
1122 Exxon Oil XON 46.00 2.50 0.75
1152 Lockheed Aero LCH 112.00 1.25 0.50
1175 Ford Auto F 88.00 1.70 0.20
1231 Intel Comp. INTL 30.00 2.00 0.00
1245 Digital Comp. DEC 120.00 1.80 0.10
1323 GM Auto GM 158.00 2.10 0.30
1378 Texaco Oil TX 230.00 2.80 1.00
1480 Conoco Oil CON 150.00 2.00 0.50
1767 Tony Lama Apparel TONY 45.00 1.50 0.25
Sequential Access
1122...other data
1152 ...
1175...
1231...
Sequential File Characteristics
Older media (cards, tapes)
Records physically ordered by primary key
Use when direct access to individual records
is not required
Accessing records
– Sequential search until record is found
Binary search can speed up access
– Must know file size and how to determine mid-
point,
Inserting Records in SAM files
Insertion
– Slow:
Sequential search to find where the record goes
If sufficient space in that page, then rewrite
If insufficient space, move some records to next
page
If no space there, keep bumping down until space is
found
– May use an “overflow” file to decrease time
Deletions and Updates to SAM
Deletion
– Slow:
Find the record
Either mark for deletion or free up the space
Rewrite
Updates
– Slow:
Find the record
Make the change
Rewrite
Binary Search to Find GM
(1323)
ID Company Industry Symbl. Price Earns. Dividnd.
1122 Exxon Oil XON 46.00 2.50 0.75
1152 Lockheed Aero LCH 112.00 1.25 0.50
1175 Ford Auto F 88.00 1.70 0.20
1231 Intel Comp. INTL 30.00 2.00 0.00
1. 1245 Digital Comp. DEC 120.00 1.80 0.10
3. 1323 GM Auto GM 158.00 2.10 0.30
2. 1378 Texaco Oil TX 230.00 2.80 1.00
1480 Conoco Oil CON 150.00 2.00 0.50
1767 Tony Lama Apparel TONY 45.00 1.50 0.25
Takes 3 accesses as opposed to 6 for linear search.
Indexed Sequential
Disk (usually)
Records physically ordered by primary key
Index gives physical location of each record
Records accessed sequentially or directly
via the index
The index is stored in a file and read into
memory when the file is opened.
Indexes must be maintained
Indexed Sequential Access
Given a value for the key
– search the index for the record address
– issue a read instruction for that address
– Fast: Possibly just one disk access
Indexed Sequential Access: Fast
Key Cyl. Trck Sect.
279-66-7549 3 10 2
452-75-6301 3 10 3
789-12-3456 3 10 4
777-13-1212 < 789-12-3456
Search Cyl. 3, Trck 10, Sect. 4
sequentially.
222-66-7634
255-75-5531
279-66-7549
333-88-9876
382-32-0658
452-75-6301
701-43-5634
777-13-1212
789-12-3456
Find record with key 777-13-1212
Inserting into ISAM files
Not very efficient
– Indexes must be updated
– Must locate where the record should go
– If there is space, insert the new record and
rewrite
– If no space, use an overflow area
– Periodically merge overflow records into file
Deletion and Updates for ISAM
Fairly efficient
– Find the record
– Make the change or mark for deletion
– Rewrite
– Periodically remove records marked for
deletion
Use ISAM files when:
Both sequential and direct access is needed.
Say we have a retail application like Foley’s.
Customer balances are updated daily.
Usually sequential access is more efficient for
batch updates.
But we may need direct access to answer
customer questions about balances.
Direct or Hashed Access
A portion of disk space is reserved
A “hashing” algorithm computes record
address
Hashing
Algorithm
455-72-3566
Address
Overflow
376-87-3425
Address
Hashed Access Characteristics
No indexes to search or maintain
Very fast direct access
Inefficient sequential access
Use when direct access is needed, but
sequential access is not.
Secondary Indexes
Provide access via non-key attributes
Three data structures discussed here:
– Linked lists: embedded pointers
– Inverted lists: cross-index tables
– B-Trees
Simple Linked List: Oil Industry
Head: 1122
ID Company Industry Symbl. Price Earns. Dividnd. Next Rec.
1122 Exxon Oil XON 46.00 2.50 0.75 1378
1152 Lockheed Aero LCH 112.00 1.25 0.50
1175 Ford Auto F 88.00 1.70 0.20
1231 Intel Comp. INTL 30.00 2.00 0.00
1245 Digital Comp. DEC 120.00 1.80 0.10
1323 GM Auto GM 158.00 2.10 0.30
1378 Texaco Oil TX 230.00 2.80 1.00 1480
1480 Conoco Oil CON 150.00 2.00 0.50 -null-
1767 Tony Lama Apparel TONY 45.00 1.50 0.25
Ring: Oil Industry
Conoco points to the head of the list.
Head: 1122
ID Company Industry Symbl. Price Earns. Dividnd. Next Rec.
1122 Exxon Oil XON 46.00 2.50 0.75 1378
1152 Lockheed Aero LCH 112.00 1.25 0.50
1175 Ford Auto F 88.00 1.70 0.20
1231 Intel Comp. INTL 30.00 2.00 0.00
1245 Digital Comp. DEC 120.00 1.80 0.10
1323 GM Auto GM 158.00 2.10 0.30
1378 Texaco Oil TX 230.00 2.80 1.00 1480
1480 Conoco Oil CON 150.00 2.00 0.50 1122
1767 Tony Lama Apparel TONY 45.00 1.50 0.25
Types of List
Non-dense: Few records (Oil stocks)
Dense: Many or all records (Symbol)
Two-way: Next and Prior pointers
Multi-list: Many lists on the same record
type
Processing with Lists
Assume an Oil industry list and the query:
SELECT * FROM STOCKS
WHERE INDUSTRY = “Oil” AND EARNINGS > 2.50
Fetch head of oil industry list.
Until Next Record Pointer = -null-
Fetch oil industry record.
If Earnings > 2.50, include on output list.
End Until.
Problems with lists
Difficult to maintain
Broken pointer chains
Inefficient to search dense lists for one or a
few records
Inverted List: Industry
Values Table Occurrences Table
Aero 1152
Auto 1175, 1323
Apparel 1767
Computer 1231, 1245
Oil 1122, 1378, 1480
Earnings and Dividends
Earnings Dividends
Values Occurrences Values Occurrences
1.25 1152 0.00 1231
1.50 1767 0.10 1245
1.70 1175 0.20 1175
2.00 1231, 1480 0.25 1767
2.10 1323 0.30 1323
2.50 1122 0.50 1152
2.80 1378 0.75 1122
1.00 1378
Select * From Stocks Where Earnings > 2.00
and Dividends < 0.60
Earnings Dividends
Values Occurrences Values Occurrences
1.25 1152 0.00 1231
1.50 1767 0.10 1245
1.70 1175 0.20 1175
2.00 1231, 1480 0.25 1767
2.10 1323 0.30 1323
2.50 1122 0.50 1152
2.80 1378 0.75 1122
1.00 1378
Only one record must be fetched !
B-Trees
Hierarchical (tree) structure
Balanced
– all paths from top to bottom the same length
Consist of:
– Index Set: Indexed access for the key field
– Sequence Set: Sequential access for the key field
A B-Tree of Order 3 and Height 2 for Stock Symbol
Number of levels in the index set
Number of values per level minus 1
Level O Index
Index Symbol Key Index Symbol Key Index Symbol Key
11 F 1175 12 TONY 1767 13
Index 11 Index 12 Index 13
Symbol Key Symbol Key Symbol Key
CON 1480 GM 1323 TX 1378
DEC 1245 INTL 1231 XON 1122
LCH 1152
Sequence Set
CON 1245 DEC 1420 F 1175
GM 1323 INTL 1231 LCH 1152
TONY 1767 TX 1378 XON 1122
Summary
Three file organizations
– Heaps
– Sorted
Sequential Access Method
Indexed Sequential Access Method
– Direct Access Method
Three data structures for secondary keys
– Linked lists, Inverted lists, B-trees

More Related Content

Viewers also liked

File Organization & processing Mid term summer 2014 - modelanswer
File Organization & processing Mid term summer 2014 - modelanswerFile Organization & processing Mid term summer 2014 - modelanswer
File Organization & processing Mid term summer 2014 - modelanswer
Arab Open University and Cairo University
 
File organization and processing
File organization and processingFile organization and processing
File organization and processing
burhan123456
 
FILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSFILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMS
Abhishek Dutta
 
File Management Presentation
File Management PresentationFile Management Presentation
File Management Presentation
SgtMasterGunz
 
File management ppt
File management pptFile management ppt
File management pptmarotti
 
Operating Systems - File Management
Operating Systems -  File ManagementOperating Systems -  File Management
Operating Systems - File Management
Damian T. Gordon
 
file handling, dynamic memory allocation
file handling, dynamic memory allocationfile handling, dynamic memory allocation
file handling, dynamic memory allocation
indra Kishor
 
Different Sorting tecniques in Data Structure
Different Sorting tecniques in Data StructureDifferent Sorting tecniques in Data Structure
Different Sorting tecniques in Data Structure
Tushar Gonawala
 
Tele3113 tut2
Tele3113 tut2Tele3113 tut2
Tele3113 tut2Vin Voro
 
CS215 - Lec 2 file organization
CS215 - Lec 2   file organizationCS215 - Lec 2   file organization
CS215 - Lec 2 file organization
Arab Open University and Cairo University
 
Ie Storage, Multimedia And File Organization
Ie   Storage, Multimedia And File OrganizationIe   Storage, Multimedia And File Organization
Ie Storage, Multimedia And File OrganizationMISY
 
Disk structure & File Handling
Disk structure & File HandlingDisk structure & File Handling
Disk structure & File Handling
Momina Idrees
 
Gps file structure explained
Gps file structure explainedGps file structure explained
Gps file structure explained
Jaap Oosterhoff
 
File Structure Concepts
File Structure ConceptsFile Structure Concepts
File Structure Concepts
Dileep Kodira
 
Discussion : File structure of Meteor Apps
Discussion : File structure of Meteor AppsDiscussion : File structure of Meteor Apps
Discussion : File structure of Meteor Apps
Sangwon Lee
 
04.01 file organization
04.01 file organization04.01 file organization
04.01 file organization
Bishal Ghimire
 
register file structure of PIC controller
register file structure of PIC controllerregister file structure of PIC controller
register file structure of PIC controller
Nirbhay Singh
 
Pf cs102 programming-8 [file handling] (1)
Pf cs102 programming-8 [file handling] (1)Pf cs102 programming-8 [file handling] (1)
Pf cs102 programming-8 [file handling] (1)
Abdullah khawar
 

Viewers also liked (20)

File Organization & processing Mid term summer 2014 - modelanswer
File Organization & processing Mid term summer 2014 - modelanswerFile Organization & processing Mid term summer 2014 - modelanswer
File Organization & processing Mid term summer 2014 - modelanswer
 
File organization and processing
File organization and processingFile organization and processing
File organization and processing
 
FILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSFILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMS
 
File Management Presentation
File Management PresentationFile Management Presentation
File Management Presentation
 
File management ppt
File management pptFile management ppt
File management ppt
 
File Management
File ManagementFile Management
File Management
 
Operating Systems - File Management
Operating Systems -  File ManagementOperating Systems -  File Management
Operating Systems - File Management
 
file handling, dynamic memory allocation
file handling, dynamic memory allocationfile handling, dynamic memory allocation
file handling, dynamic memory allocation
 
Different Sorting tecniques in Data Structure
Different Sorting tecniques in Data StructureDifferent Sorting tecniques in Data Structure
Different Sorting tecniques in Data Structure
 
Tele3113 tut2
Tele3113 tut2Tele3113 tut2
Tele3113 tut2
 
CS215 - Lec 2 file organization
CS215 - Lec 2   file organizationCS215 - Lec 2   file organization
CS215 - Lec 2 file organization
 
Ie Storage, Multimedia And File Organization
Ie   Storage, Multimedia And File OrganizationIe   Storage, Multimedia And File Organization
Ie Storage, Multimedia And File Organization
 
Disk structure & File Handling
Disk structure & File HandlingDisk structure & File Handling
Disk structure & File Handling
 
Gps file structure explained
Gps file structure explainedGps file structure explained
Gps file structure explained
 
CS215 - Lec 9 indexing and reclaiming space in files
CS215 - Lec 9  indexing and reclaiming space in filesCS215 - Lec 9  indexing and reclaiming space in files
CS215 - Lec 9 indexing and reclaiming space in files
 
File Structure Concepts
File Structure ConceptsFile Structure Concepts
File Structure Concepts
 
Discussion : File structure of Meteor Apps
Discussion : File structure of Meteor AppsDiscussion : File structure of Meteor Apps
Discussion : File structure of Meteor Apps
 
04.01 file organization
04.01 file organization04.01 file organization
04.01 file organization
 
register file structure of PIC controller
register file structure of PIC controllerregister file structure of PIC controller
register file structure of PIC controller
 
Pf cs102 programming-8 [file handling] (1)
Pf cs102 programming-8 [file handling] (1)Pf cs102 programming-8 [file handling] (1)
Pf cs102 programming-8 [file handling] (1)
 

Similar to File organization techniques

Improve speed and performance of informix 11.xx part 2
Improve speed and performance of informix 11.xx   part 2Improve speed and performance of informix 11.xx   part 2
Improve speed and performance of informix 11.xx part 2
am_prasanna
 
My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)
Gustavo Rene Antunez
 
Ds8000 Practical Performance Analysis P04 20060718
Ds8000 Practical Performance Analysis P04 20060718Ds8000 Practical Performance Analysis P04 20060718
Ds8000 Practical Performance Analysis P04 20060718
brettallison
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Wim Godden
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Wim Godden
 
Introduction To Unix
Introduction To UnixIntroduction To Unix
Introduction To UnixCTIN
 
Cassandra lesson learned - extended
Cassandra   lesson learned  - extendedCassandra   lesson learned  - extended
Cassandra lesson learned - extended
Andrzej Ludwikowski
 
Sysdig Tokyo Meetup 2018 02-27
Sysdig Tokyo Meetup 2018 02-27Sysdig Tokyo Meetup 2018 02-27
Sysdig Tokyo Meetup 2018 02-27
Michael Ducy
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Wim Godden
 
Cassandra - lesson learned
Cassandra  - lesson learnedCassandra  - lesson learned
Cassandra - lesson learned
Andrzej Ludwikowski
 
Using AWR for IO Subsystem Analysis
Using AWR for IO Subsystem AnalysisUsing AWR for IO Subsystem Analysis
Using AWR for IO Subsystem Analysis
Texas Memory Systems, and IBM Company
 
16aug06.ppt
16aug06.ppt16aug06.ppt
16aug06.ppt
zagreb2
 
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
Insight Technology, Inc.
 
OOUG - Oracle Performance Tuning with AAS
OOUG - Oracle Performance Tuning with AASOOUG - Oracle Performance Tuning with AAS
OOUG - Oracle Performance Tuning with AASKyle Hailey
 
Avoiding Chaos: Methodology for Managing Performance in a Shared Storage A...
Avoiding Chaos:  Methodology for Managing Performance in a Shared Storage A...Avoiding Chaos:  Methodology for Managing Performance in a Shared Storage A...
Avoiding Chaos: Methodology for Managing Performance in a Shared Storage A...
brettallison
 
Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Oracle Trace File Analyzer - What's New in 12.2.1.1.0Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Gareth Chapman
 
Sheik Mohamed Shadik - BSc - Project Details
Sheik Mohamed Shadik - BSc - Project DetailsSheik Mohamed Shadik - BSc - Project Details
Sheik Mohamed Shadik - BSc - Project Details
shadikbsc
 
UKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction LocksUKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction LocksKyle Hailey
 
Palpandi
PalpandiPalpandi

Similar to File organization techniques (20)

Improve speed and performance of informix 11.xx part 2
Improve speed and performance of informix 11.xx   part 2Improve speed and performance of informix 11.xx   part 2
Improve speed and performance of informix 11.xx part 2
 
My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)
 
Ds8000 Practical Performance Analysis P04 20060718
Ds8000 Practical Performance Analysis P04 20060718Ds8000 Practical Performance Analysis P04 20060718
Ds8000 Practical Performance Analysis P04 20060718
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Introduction To Unix
Introduction To UnixIntroduction To Unix
Introduction To Unix
 
Cassandra lesson learned - extended
Cassandra   lesson learned  - extendedCassandra   lesson learned  - extended
Cassandra lesson learned - extended
 
Sysdig Tokyo Meetup 2018 02-27
Sysdig Tokyo Meetup 2018 02-27Sysdig Tokyo Meetup 2018 02-27
Sysdig Tokyo Meetup 2018 02-27
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Cassandra - lesson learned
Cassandra  - lesson learnedCassandra  - lesson learned
Cassandra - lesson learned
 
Using AWR for IO Subsystem Analysis
Using AWR for IO Subsystem AnalysisUsing AWR for IO Subsystem Analysis
Using AWR for IO Subsystem Analysis
 
ISR UNIT2.ppt
ISR UNIT2.pptISR UNIT2.ppt
ISR UNIT2.ppt
 
16aug06.ppt
16aug06.ppt16aug06.ppt
16aug06.ppt
 
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
 
OOUG - Oracle Performance Tuning with AAS
OOUG - Oracle Performance Tuning with AASOOUG - Oracle Performance Tuning with AAS
OOUG - Oracle Performance Tuning with AAS
 
Avoiding Chaos: Methodology for Managing Performance in a Shared Storage A...
Avoiding Chaos:  Methodology for Managing Performance in a Shared Storage A...Avoiding Chaos:  Methodology for Managing Performance in a Shared Storage A...
Avoiding Chaos: Methodology for Managing Performance in a Shared Storage A...
 
Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Oracle Trace File Analyzer - What's New in 12.2.1.1.0Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Oracle Trace File Analyzer - What's New in 12.2.1.1.0
 
Sheik Mohamed Shadik - BSc - Project Details
Sheik Mohamed Shadik - BSc - Project DetailsSheik Mohamed Shadik - BSc - Project Details
Sheik Mohamed Shadik - BSc - Project Details
 
UKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction LocksUKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction Locks
 
Palpandi
PalpandiPalpandi
Palpandi
 

Recently uploaded

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 

Recently uploaded (20)

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 

File organization techniques

  • 1. Physical Design Considerations File Organization Techniques Record Access Methods Data Structures
  • 2. Physical Design Provide good performance – Fast response time – Minimum disk accesses
  • 3. Disk Assembly Cylinder Track Page or Sector Access time = seek time + rotational delay + transfer time
  • 4. Definitions Seek time – Average time to move the read-write head Rotational delay – Average time for the sector to move under the read-write head Transfer – Time to read a sector and transfer the data to memory
  • 5. More Definitions Logical Record – The data about an entity (a row in a table) Physical Record – A sector, page or block on the storage medium Typically several logical records can be stored in one physical record.
  • 6. File Organization Techniques Three techniques – Heap (unordered) – Sorted Sequential (SAM) Indexed Sequential (ISAM) – Hashed or Direct
  • 7. Heap File ID Company Industry Symbl. Price Earns. Dividnd. 1767 Tony Lama Apparel TONY 45.00 1.50 0.25 1152 Lockheed Aero LCH 112.00 1.25 0.50 1175 Ford Auto F 88.00 1.70 0.20 1122 Exxon Oil XON 46.00 2.50 0.75 1231 Intel Comp. INTL 30.00 2.00 0.00 1323 GM Auto GM 158.00 2.10 0.30 1378 Texaco Oil TX 230.00 2.80 1.00 1245 Digital Comp. DEC 120.00 1.80 0.10 Tony Lama was the first record added, Digital was the last.
  • 8. Heap File Characteristics Insertion – Fast: New records added at the end of the file Retrieval – Slow: A sequential search is required Update - Delete – Slow: Sequential search to find the page Make the update or mark for deletion Re-write the page
  • 9. Sequential (Ordered) File ID Company Industry Symbl. Price Earns. Dividnd. 1122 Exxon Oil XON 46.00 2.50 0.75 1152 Lockheed Aero LCH 112.00 1.25 0.50 1175 Ford Auto F 88.00 1.70 0.20 1231 Intel Comp. INTL 30.00 2.00 0.00 1245 Digital Comp. DEC 120.00 1.80 0.10 1323 GM Auto GM 158.00 2.10 0.30 1378 Texaco Oil TX 230.00 2.80 1.00 1480 Conoco Oil CON 150.00 2.00 0.50 1767 Tony Lama Apparel TONY 45.00 1.50 0.25
  • 11. Sequential File Characteristics Older media (cards, tapes) Records physically ordered by primary key Use when direct access to individual records is not required Accessing records – Sequential search until record is found Binary search can speed up access – Must know file size and how to determine mid- point,
  • 12. Inserting Records in SAM files Insertion – Slow: Sequential search to find where the record goes If sufficient space in that page, then rewrite If insufficient space, move some records to next page If no space there, keep bumping down until space is found – May use an “overflow” file to decrease time
  • 13. Deletions and Updates to SAM Deletion – Slow: Find the record Either mark for deletion or free up the space Rewrite Updates – Slow: Find the record Make the change Rewrite
  • 14. Binary Search to Find GM (1323) ID Company Industry Symbl. Price Earns. Dividnd. 1122 Exxon Oil XON 46.00 2.50 0.75 1152 Lockheed Aero LCH 112.00 1.25 0.50 1175 Ford Auto F 88.00 1.70 0.20 1231 Intel Comp. INTL 30.00 2.00 0.00 1. 1245 Digital Comp. DEC 120.00 1.80 0.10 3. 1323 GM Auto GM 158.00 2.10 0.30 2. 1378 Texaco Oil TX 230.00 2.80 1.00 1480 Conoco Oil CON 150.00 2.00 0.50 1767 Tony Lama Apparel TONY 45.00 1.50 0.25 Takes 3 accesses as opposed to 6 for linear search.
  • 15. Indexed Sequential Disk (usually) Records physically ordered by primary key Index gives physical location of each record Records accessed sequentially or directly via the index The index is stored in a file and read into memory when the file is opened. Indexes must be maintained
  • 16. Indexed Sequential Access Given a value for the key – search the index for the record address – issue a read instruction for that address – Fast: Possibly just one disk access
  • 17. Indexed Sequential Access: Fast Key Cyl. Trck Sect. 279-66-7549 3 10 2 452-75-6301 3 10 3 789-12-3456 3 10 4 777-13-1212 < 789-12-3456 Search Cyl. 3, Trck 10, Sect. 4 sequentially. 222-66-7634 255-75-5531 279-66-7549 333-88-9876 382-32-0658 452-75-6301 701-43-5634 777-13-1212 789-12-3456 Find record with key 777-13-1212
  • 18. Inserting into ISAM files Not very efficient – Indexes must be updated – Must locate where the record should go – If there is space, insert the new record and rewrite – If no space, use an overflow area – Periodically merge overflow records into file
  • 19. Deletion and Updates for ISAM Fairly efficient – Find the record – Make the change or mark for deletion – Rewrite – Periodically remove records marked for deletion
  • 20. Use ISAM files when: Both sequential and direct access is needed. Say we have a retail application like Foley’s. Customer balances are updated daily. Usually sequential access is more efficient for batch updates. But we may need direct access to answer customer questions about balances.
  • 21. Direct or Hashed Access A portion of disk space is reserved A “hashing” algorithm computes record address Hashing Algorithm 455-72-3566 Address Overflow 376-87-3425 Address
  • 22. Hashed Access Characteristics No indexes to search or maintain Very fast direct access Inefficient sequential access Use when direct access is needed, but sequential access is not.
  • 23. Secondary Indexes Provide access via non-key attributes Three data structures discussed here: – Linked lists: embedded pointers – Inverted lists: cross-index tables – B-Trees
  • 24. Simple Linked List: Oil Industry Head: 1122 ID Company Industry Symbl. Price Earns. Dividnd. Next Rec. 1122 Exxon Oil XON 46.00 2.50 0.75 1378 1152 Lockheed Aero LCH 112.00 1.25 0.50 1175 Ford Auto F 88.00 1.70 0.20 1231 Intel Comp. INTL 30.00 2.00 0.00 1245 Digital Comp. DEC 120.00 1.80 0.10 1323 GM Auto GM 158.00 2.10 0.30 1378 Texaco Oil TX 230.00 2.80 1.00 1480 1480 Conoco Oil CON 150.00 2.00 0.50 -null- 1767 Tony Lama Apparel TONY 45.00 1.50 0.25
  • 25. Ring: Oil Industry Conoco points to the head of the list. Head: 1122 ID Company Industry Symbl. Price Earns. Dividnd. Next Rec. 1122 Exxon Oil XON 46.00 2.50 0.75 1378 1152 Lockheed Aero LCH 112.00 1.25 0.50 1175 Ford Auto F 88.00 1.70 0.20 1231 Intel Comp. INTL 30.00 2.00 0.00 1245 Digital Comp. DEC 120.00 1.80 0.10 1323 GM Auto GM 158.00 2.10 0.30 1378 Texaco Oil TX 230.00 2.80 1.00 1480 1480 Conoco Oil CON 150.00 2.00 0.50 1122 1767 Tony Lama Apparel TONY 45.00 1.50 0.25
  • 26. Types of List Non-dense: Few records (Oil stocks) Dense: Many or all records (Symbol) Two-way: Next and Prior pointers Multi-list: Many lists on the same record type
  • 27. Processing with Lists Assume an Oil industry list and the query: SELECT * FROM STOCKS WHERE INDUSTRY = “Oil” AND EARNINGS > 2.50 Fetch head of oil industry list. Until Next Record Pointer = -null- Fetch oil industry record. If Earnings > 2.50, include on output list. End Until.
  • 28. Problems with lists Difficult to maintain Broken pointer chains Inefficient to search dense lists for one or a few records
  • 29. Inverted List: Industry Values Table Occurrences Table Aero 1152 Auto 1175, 1323 Apparel 1767 Computer 1231, 1245 Oil 1122, 1378, 1480
  • 30. Earnings and Dividends Earnings Dividends Values Occurrences Values Occurrences 1.25 1152 0.00 1231 1.50 1767 0.10 1245 1.70 1175 0.20 1175 2.00 1231, 1480 0.25 1767 2.10 1323 0.30 1323 2.50 1122 0.50 1152 2.80 1378 0.75 1122 1.00 1378
  • 31. Select * From Stocks Where Earnings > 2.00 and Dividends < 0.60 Earnings Dividends Values Occurrences Values Occurrences 1.25 1152 0.00 1231 1.50 1767 0.10 1245 1.70 1175 0.20 1175 2.00 1231, 1480 0.25 1767 2.10 1323 0.30 1323 2.50 1122 0.50 1152 2.80 1378 0.75 1122 1.00 1378 Only one record must be fetched !
  • 32. B-Trees Hierarchical (tree) structure Balanced – all paths from top to bottom the same length Consist of: – Index Set: Indexed access for the key field – Sequence Set: Sequential access for the key field
  • 33. A B-Tree of Order 3 and Height 2 for Stock Symbol Number of levels in the index set Number of values per level minus 1 Level O Index Index Symbol Key Index Symbol Key Index Symbol Key 11 F 1175 12 TONY 1767 13 Index 11 Index 12 Index 13 Symbol Key Symbol Key Symbol Key CON 1480 GM 1323 TX 1378 DEC 1245 INTL 1231 XON 1122 LCH 1152 Sequence Set CON 1245 DEC 1420 F 1175 GM 1323 INTL 1231 LCH 1152 TONY 1767 TX 1378 XON 1122
  • 34. Summary Three file organizations – Heaps – Sorted Sequential Access Method Indexed Sequential Access Method – Direct Access Method Three data structures for secondary keys – Linked lists, Inverted lists, B-trees