SlideShare a Scribd company logo
1 of 34
Physical Design Considerations
File Organization Techniques
Record Access Methods
Data Structures
Physical Design
Provide good performance
– Fast response time
– Minimum disk accesses
Disk Assembly
Cylinder
Track
Page or
Sector
Access time = seek time + rotational delay + transfer time
Definitions
Seek time
– Average time to move the read-write head
Rotational delay
– Average time for the sector to move under the
read-write head
Transfer
– Time to read a sector and transfer the data to
memory
More Definitions
Logical Record
– The data about an entity (a row in a table)
Physical Record
– A sector, page or block on the storage medium
Typically several logical records can be
stored in one physical record.
File Organization Techniques
Three techniques
– Heap (unordered)
– Sorted
Sequential (SAM)
Indexed Sequential (ISAM)
– Hashed or Direct
Heap File
ID Company Industry Symbl. Price Earns. Dividnd.
1767 Tony Lama Apparel TONY 45.00 1.50 0.25
1152 Lockheed Aero LCH 112.00 1.25 0.50
1175 Ford Auto F 88.00 1.70 0.20
1122 Exxon Oil XON 46.00 2.50 0.75
1231 Intel Comp. INTL 30.00 2.00 0.00
1323 GM Auto GM 158.00 2.10 0.30
1378 Texaco Oil TX 230.00 2.80 1.00
1245 Digital Comp. DEC 120.00 1.80 0.10
Tony Lama was the first record added,
Digital was the last.
Heap File Characteristics
Insertion
– Fast: New records added at the end of the file
Retrieval
– Slow: A sequential search is required
Update - Delete
– Slow:
Sequential search to find the page
Make the update or mark for deletion
Re-write the page
Sequential (Ordered) File
ID Company Industry Symbl. Price Earns. Dividnd.
1122 Exxon Oil XON 46.00 2.50 0.75
1152 Lockheed Aero LCH 112.00 1.25 0.50
1175 Ford Auto F 88.00 1.70 0.20
1231 Intel Comp. INTL 30.00 2.00 0.00
1245 Digital Comp. DEC 120.00 1.80 0.10
1323 GM Auto GM 158.00 2.10 0.30
1378 Texaco Oil TX 230.00 2.80 1.00
1480 Conoco Oil CON 150.00 2.00 0.50
1767 Tony Lama Apparel TONY 45.00 1.50 0.25
Sequential Access
1122...other data
1152 ...
1175...
1231...
Sequential File Characteristics
Older media (cards, tapes)
Records physically ordered by primary key
Use when direct access to individual records
is not required
Accessing records
– Sequential search until record is found
Binary search can speed up access
– Must know file size and how to determine mid-
point,
Inserting Records in SAM files
Insertion
– Slow:
Sequential search to find where the record goes
If sufficient space in that page, then rewrite
If insufficient space, move some records to next
page
If no space there, keep bumping down until space is
found
– May use an “overflow” file to decrease time
Deletions and Updates to SAM
Deletion
– Slow:
Find the record
Either mark for deletion or free up the space
Rewrite
Updates
– Slow:
Find the record
Make the change
Rewrite
Binary Search to Find GM
(1323)
ID Company Industry Symbl. Price Earns. Dividnd.
1122 Exxon Oil XON 46.00 2.50 0.75
1152 Lockheed Aero LCH 112.00 1.25 0.50
1175 Ford Auto F 88.00 1.70 0.20
1231 Intel Comp. INTL 30.00 2.00 0.00
1. 1245 Digital Comp. DEC 120.00 1.80 0.10
3. 1323 GM Auto GM 158.00 2.10 0.30
2. 1378 Texaco Oil TX 230.00 2.80 1.00
1480 Conoco Oil CON 150.00 2.00 0.50
1767 Tony Lama Apparel TONY 45.00 1.50 0.25
Takes 3 accesses as opposed to 6 for linear search.
Indexed Sequential
Disk (usually)
Records physically ordered by primary key
Index gives physical location of each record
Records accessed sequentially or directly
via the index
The index is stored in a file and read into
memory when the file is opened.
Indexes must be maintained
Indexed Sequential Access
Given a value for the key
– search the index for the record address
– issue a read instruction for that address
– Fast: Possibly just one disk access
Indexed Sequential Access: Fast
Key Cyl. Trck Sect.
279-66-7549 3 10 2
452-75-6301 3 10 3
789-12-3456 3 10 4
777-13-1212 < 789-12-3456
Search Cyl. 3, Trck 10, Sect. 4
sequentially.
222-66-7634
255-75-5531
279-66-7549
333-88-9876
382-32-0658
452-75-6301
701-43-5634
777-13-1212
789-12-3456
Find record with key 777-13-1212
Inserting into ISAM files
Not very efficient
– Indexes must be updated
– Must locate where the record should go
– If there is space, insert the new record and
rewrite
– If no space, use an overflow area
– Periodically merge overflow records into file
Deletion and Updates for ISAM
Fairly efficient
– Find the record
– Make the change or mark for deletion
– Rewrite
– Periodically remove records marked for
deletion
Use ISAM files when:
Both sequential and direct access is needed.
Say we have a retail application like Foley’s.
Customer balances are updated daily.
Usually sequential access is more efficient for
batch updates.
But we may need direct access to answer
customer questions about balances.
Direct or Hashed Access
A portion of disk space is reserved
A “hashing” algorithm computes record
address
Hashing
Algorithm
455-72-3566
Address
Overflow
376-87-3425
Address
Hashed Access Characteristics
No indexes to search or maintain
Very fast direct access
Inefficient sequential access
Use when direct access is needed, but
sequential access is not.
Secondary Indexes
Provide access via non-key attributes
Three data structures discussed here:
– Linked lists: embedded pointers
– Inverted lists: cross-index tables
– B-Trees
Simple Linked List: Oil Industry
Head: 1122
ID Company Industry Symbl. Price Earns. Dividnd. Next Rec.
1122 Exxon Oil XON 46.00 2.50 0.75 1378
1152 Lockheed Aero LCH 112.00 1.25 0.50
1175 Ford Auto F 88.00 1.70 0.20
1231 Intel Comp. INTL 30.00 2.00 0.00
1245 Digital Comp. DEC 120.00 1.80 0.10
1323 GM Auto GM 158.00 2.10 0.30
1378 Texaco Oil TX 230.00 2.80 1.00 1480
1480 Conoco Oil CON 150.00 2.00 0.50 -null-
1767 Tony Lama Apparel TONY 45.00 1.50 0.25
Ring: Oil Industry
Conoco points to the head of the list.
Head: 1122
ID Company Industry Symbl. Price Earns. Dividnd. Next Rec.
1122 Exxon Oil XON 46.00 2.50 0.75 1378
1152 Lockheed Aero LCH 112.00 1.25 0.50
1175 Ford Auto F 88.00 1.70 0.20
1231 Intel Comp. INTL 30.00 2.00 0.00
1245 Digital Comp. DEC 120.00 1.80 0.10
1323 GM Auto GM 158.00 2.10 0.30
1378 Texaco Oil TX 230.00 2.80 1.00 1480
1480 Conoco Oil CON 150.00 2.00 0.50 1122
1767 Tony Lama Apparel TONY 45.00 1.50 0.25
Types of List
Non-dense: Few records (Oil stocks)
Dense: Many or all records (Symbol)
Two-way: Next and Prior pointers
Multi-list: Many lists on the same record
type
Processing with Lists
Assume an Oil industry list and the query:
SELECT * FROM STOCKS
WHERE INDUSTRY = “Oil” AND EARNINGS > 2.50
Fetch head of oil industry list.
Until Next Record Pointer = -null-
Fetch oil industry record.
If Earnings > 2.50, include on output list.
End Until.
Problems with lists
Difficult to maintain
Broken pointer chains
Inefficient to search dense lists for one or a
few records
Inverted List: Industry
Values Table Occurrences Table
Aero 1152
Auto 1175, 1323
Apparel 1767
Computer 1231, 1245
Oil 1122, 1378, 1480
Earnings and Dividends
Earnings Dividends
Values Occurrences Values Occurrences
1.25 1152 0.00 1231
1.50 1767 0.10 1245
1.70 1175 0.20 1175
2.00 1231, 1480 0.25 1767
2.10 1323 0.30 1323
2.50 1122 0.50 1152
2.80 1378 0.75 1122
1.00 1378
Select * From Stocks Where Earnings > 2.00
and Dividends < 0.60
Earnings Dividends
Values Occurrences Values Occurrences
1.25 1152 0.00 1231
1.50 1767 0.10 1245
1.70 1175 0.20 1175
2.00 1231, 1480 0.25 1767
2.10 1323 0.30 1323
2.50 1122 0.50 1152
2.80 1378 0.75 1122
1.00 1378
Only one record must be fetched !
B-Trees
Hierarchical (tree) structure
Balanced
– all paths from top to bottom the same length
Consist of:
– Index Set: Indexed access for the key field
– Sequence Set: Sequential access for the key field
A B-Tree of Order 3 and Height 2 for Stock Symbol
Number of levels in the index set
Number of values per level minus 1
Level O Index
Index Symbol Key Index Symbol Key Index Symbol Key
11 F 1175 12 TONY 1767 13
Index 11 Index 12 Index 13
Symbol Key Symbol Key Symbol Key
CON 1480 GM 1323 TX 1378
DEC 1245 INTL 1231 XON 1122
LCH 1152
Sequence Set
CON 1245 DEC 1420 F 1175
GM 1323 INTL 1231 LCH 1152
TONY 1767 TX 1378 XON 1122
Summary
Three file organizations
– Heaps
– Sorted
Sequential Access Method
Indexed Sequential Access Method
– Direct Access Method
Three data structures for secondary keys
– Linked lists, Inverted lists, B-trees

More Related Content

Viewers also liked

File organization and processing
File organization and processingFile organization and processing
File organization and processingburhan123456
 
FILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSFILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSAbhishek Dutta
 
File Management Presentation
File Management PresentationFile Management Presentation
File Management PresentationSgtMasterGunz
 
File management ppt
File management pptFile management ppt
File management pptmarotti
 
Operating Systems - File Management
Operating Systems -  File ManagementOperating Systems -  File Management
Operating Systems - File ManagementDamian T. Gordon
 
file handling, dynamic memory allocation
file handling, dynamic memory allocationfile handling, dynamic memory allocation
file handling, dynamic memory allocationindra Kishor
 
Different Sorting tecniques in Data Structure
Different Sorting tecniques in Data StructureDifferent Sorting tecniques in Data Structure
Different Sorting tecniques in Data StructureTushar Gonawala
 
Tele3113 tut2
Tele3113 tut2Tele3113 tut2
Tele3113 tut2Vin Voro
 
Ie Storage, Multimedia And File Organization
Ie   Storage, Multimedia And File OrganizationIe   Storage, Multimedia And File Organization
Ie Storage, Multimedia And File OrganizationMISY
 
Disk structure & File Handling
Disk structure & File HandlingDisk structure & File Handling
Disk structure & File HandlingMomina Idrees
 
Gps file structure explained
Gps file structure explainedGps file structure explained
Gps file structure explainedJaap Oosterhoff
 
File Structure Concepts
File Structure ConceptsFile Structure Concepts
File Structure ConceptsDileep Kodira
 
Discussion : File structure of Meteor Apps
Discussion : File structure of Meteor AppsDiscussion : File structure of Meteor Apps
Discussion : File structure of Meteor AppsSangwon Lee
 
04.01 file organization
04.01 file organization04.01 file organization
04.01 file organizationBishal Ghimire
 
register file structure of PIC controller
register file structure of PIC controllerregister file structure of PIC controller
register file structure of PIC controllerNirbhay Singh
 
Pf cs102 programming-8 [file handling] (1)
Pf cs102 programming-8 [file handling] (1)Pf cs102 programming-8 [file handling] (1)
Pf cs102 programming-8 [file handling] (1)Abdullah khawar
 

Viewers also liked (20)

File Organization & processing Mid term summer 2014 - modelanswer
File Organization & processing Mid term summer 2014 - modelanswerFile Organization & processing Mid term summer 2014 - modelanswer
File Organization & processing Mid term summer 2014 - modelanswer
 
File organization and processing
File organization and processingFile organization and processing
File organization and processing
 
FILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSFILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMS
 
File Management Presentation
File Management PresentationFile Management Presentation
File Management Presentation
 
File management ppt
File management pptFile management ppt
File management ppt
 
File Management
File ManagementFile Management
File Management
 
Operating Systems - File Management
Operating Systems -  File ManagementOperating Systems -  File Management
Operating Systems - File Management
 
file handling, dynamic memory allocation
file handling, dynamic memory allocationfile handling, dynamic memory allocation
file handling, dynamic memory allocation
 
Different Sorting tecniques in Data Structure
Different Sorting tecniques in Data StructureDifferent Sorting tecniques in Data Structure
Different Sorting tecniques in Data Structure
 
Tele3113 tut2
Tele3113 tut2Tele3113 tut2
Tele3113 tut2
 
CS215 - Lec 2 file organization
CS215 - Lec 2   file organizationCS215 - Lec 2   file organization
CS215 - Lec 2 file organization
 
Ie Storage, Multimedia And File Organization
Ie   Storage, Multimedia And File OrganizationIe   Storage, Multimedia And File Organization
Ie Storage, Multimedia And File Organization
 
Disk structure & File Handling
Disk structure & File HandlingDisk structure & File Handling
Disk structure & File Handling
 
Gps file structure explained
Gps file structure explainedGps file structure explained
Gps file structure explained
 
CS215 - Lec 9 indexing and reclaiming space in files
CS215 - Lec 9  indexing and reclaiming space in filesCS215 - Lec 9  indexing and reclaiming space in files
CS215 - Lec 9 indexing and reclaiming space in files
 
File Structure Concepts
File Structure ConceptsFile Structure Concepts
File Structure Concepts
 
Discussion : File structure of Meteor Apps
Discussion : File structure of Meteor AppsDiscussion : File structure of Meteor Apps
Discussion : File structure of Meteor Apps
 
04.01 file organization
04.01 file organization04.01 file organization
04.01 file organization
 
register file structure of PIC controller
register file structure of PIC controllerregister file structure of PIC controller
register file structure of PIC controller
 
Pf cs102 programming-8 [file handling] (1)
Pf cs102 programming-8 [file handling] (1)Pf cs102 programming-8 [file handling] (1)
Pf cs102 programming-8 [file handling] (1)
 

Similar to File organization techniques

Improve speed and performance of informix 11.xx part 2
Improve speed and performance of informix 11.xx   part 2Improve speed and performance of informix 11.xx   part 2
Improve speed and performance of informix 11.xx part 2am_prasanna
 
My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)Gustavo Rene Antunez
 
Ds8000 Practical Performance Analysis P04 20060718
Ds8000 Practical Performance Analysis P04 20060718Ds8000 Practical Performance Analysis P04 20060718
Ds8000 Practical Performance Analysis P04 20060718brettallison
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Introduction To Unix
Introduction To UnixIntroduction To Unix
Introduction To UnixCTIN
 
Cassandra lesson learned - extended
Cassandra   lesson learned  - extendedCassandra   lesson learned  - extended
Cassandra lesson learned - extendedAndrzej Ludwikowski
 
Sysdig Tokyo Meetup 2018 02-27
Sysdig Tokyo Meetup 2018 02-27Sysdig Tokyo Meetup 2018 02-27
Sysdig Tokyo Meetup 2018 02-27Michael Ducy
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
16aug06.ppt
16aug06.ppt16aug06.ppt
16aug06.pptzagreb2
 
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』Insight Technology, Inc.
 
OOUG - Oracle Performance Tuning with AAS
OOUG - Oracle Performance Tuning with AASOOUG - Oracle Performance Tuning with AAS
OOUG - Oracle Performance Tuning with AASKyle Hailey
 
Avoiding Chaos: Methodology for Managing Performance in a Shared Storage A...
Avoiding Chaos:  Methodology for Managing Performance in a Shared Storage A...Avoiding Chaos:  Methodology for Managing Performance in a Shared Storage A...
Avoiding Chaos: Methodology for Managing Performance in a Shared Storage A...brettallison
 
Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Oracle Trace File Analyzer - What's New in 12.2.1.1.0Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Oracle Trace File Analyzer - What's New in 12.2.1.1.0Gareth Chapman
 
Sheik Mohamed Shadik - BSc - Project Details
Sheik Mohamed Shadik - BSc - Project DetailsSheik Mohamed Shadik - BSc - Project Details
Sheik Mohamed Shadik - BSc - Project Detailsshadikbsc
 
UKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction LocksUKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction LocksKyle Hailey
 
I/O System and Csae Study
I/O System and Csae StudyI/O System and Csae Study
I/O System and Csae Studypalpandi it
 

Similar to File organization techniques (20)

Improve speed and performance of informix 11.xx part 2
Improve speed and performance of informix 11.xx   part 2Improve speed and performance of informix 11.xx   part 2
Improve speed and performance of informix 11.xx part 2
 
My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)
 
Ds8000 Practical Performance Analysis P04 20060718
Ds8000 Practical Performance Analysis P04 20060718Ds8000 Practical Performance Analysis P04 20060718
Ds8000 Practical Performance Analysis P04 20060718
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Introduction To Unix
Introduction To UnixIntroduction To Unix
Introduction To Unix
 
Cassandra lesson learned - extended
Cassandra   lesson learned  - extendedCassandra   lesson learned  - extended
Cassandra lesson learned - extended
 
Sysdig Tokyo Meetup 2018 02-27
Sysdig Tokyo Meetup 2018 02-27Sysdig Tokyo Meetup 2018 02-27
Sysdig Tokyo Meetup 2018 02-27
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Cassandra - lesson learned
Cassandra  - lesson learnedCassandra  - lesson learned
Cassandra - lesson learned
 
Using AWR for IO Subsystem Analysis
Using AWR for IO Subsystem AnalysisUsing AWR for IO Subsystem Analysis
Using AWR for IO Subsystem Analysis
 
ISR UNIT2.ppt
ISR UNIT2.pptISR UNIT2.ppt
ISR UNIT2.ppt
 
16aug06.ppt
16aug06.ppt16aug06.ppt
16aug06.ppt
 
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
[db tech showcase Tokyo 2018] #dbts2018 #B17 『オラクル パフォーマンス チューニング - 神話、伝説と解決策』
 
OOUG - Oracle Performance Tuning with AAS
OOUG - Oracle Performance Tuning with AASOOUG - Oracle Performance Tuning with AAS
OOUG - Oracle Performance Tuning with AAS
 
Avoiding Chaos: Methodology for Managing Performance in a Shared Storage A...
Avoiding Chaos:  Methodology for Managing Performance in a Shared Storage A...Avoiding Chaos:  Methodology for Managing Performance in a Shared Storage A...
Avoiding Chaos: Methodology for Managing Performance in a Shared Storage A...
 
Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Oracle Trace File Analyzer - What's New in 12.2.1.1.0Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Oracle Trace File Analyzer - What's New in 12.2.1.1.0
 
Sheik Mohamed Shadik - BSc - Project Details
Sheik Mohamed Shadik - BSc - Project DetailsSheik Mohamed Shadik - BSc - Project Details
Sheik Mohamed Shadik - BSc - Project Details
 
UKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction LocksUKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction Locks
 
I/O System and Csae Study
I/O System and Csae StudyI/O System and Csae Study
I/O System and Csae Study
 

Recently uploaded

Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Recently uploaded (20)

Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

File organization techniques

  • 1. Physical Design Considerations File Organization Techniques Record Access Methods Data Structures
  • 2. Physical Design Provide good performance – Fast response time – Minimum disk accesses
  • 3. Disk Assembly Cylinder Track Page or Sector Access time = seek time + rotational delay + transfer time
  • 4. Definitions Seek time – Average time to move the read-write head Rotational delay – Average time for the sector to move under the read-write head Transfer – Time to read a sector and transfer the data to memory
  • 5. More Definitions Logical Record – The data about an entity (a row in a table) Physical Record – A sector, page or block on the storage medium Typically several logical records can be stored in one physical record.
  • 6. File Organization Techniques Three techniques – Heap (unordered) – Sorted Sequential (SAM) Indexed Sequential (ISAM) – Hashed or Direct
  • 7. Heap File ID Company Industry Symbl. Price Earns. Dividnd. 1767 Tony Lama Apparel TONY 45.00 1.50 0.25 1152 Lockheed Aero LCH 112.00 1.25 0.50 1175 Ford Auto F 88.00 1.70 0.20 1122 Exxon Oil XON 46.00 2.50 0.75 1231 Intel Comp. INTL 30.00 2.00 0.00 1323 GM Auto GM 158.00 2.10 0.30 1378 Texaco Oil TX 230.00 2.80 1.00 1245 Digital Comp. DEC 120.00 1.80 0.10 Tony Lama was the first record added, Digital was the last.
  • 8. Heap File Characteristics Insertion – Fast: New records added at the end of the file Retrieval – Slow: A sequential search is required Update - Delete – Slow: Sequential search to find the page Make the update or mark for deletion Re-write the page
  • 9. Sequential (Ordered) File ID Company Industry Symbl. Price Earns. Dividnd. 1122 Exxon Oil XON 46.00 2.50 0.75 1152 Lockheed Aero LCH 112.00 1.25 0.50 1175 Ford Auto F 88.00 1.70 0.20 1231 Intel Comp. INTL 30.00 2.00 0.00 1245 Digital Comp. DEC 120.00 1.80 0.10 1323 GM Auto GM 158.00 2.10 0.30 1378 Texaco Oil TX 230.00 2.80 1.00 1480 Conoco Oil CON 150.00 2.00 0.50 1767 Tony Lama Apparel TONY 45.00 1.50 0.25
  • 11. Sequential File Characteristics Older media (cards, tapes) Records physically ordered by primary key Use when direct access to individual records is not required Accessing records – Sequential search until record is found Binary search can speed up access – Must know file size and how to determine mid- point,
  • 12. Inserting Records in SAM files Insertion – Slow: Sequential search to find where the record goes If sufficient space in that page, then rewrite If insufficient space, move some records to next page If no space there, keep bumping down until space is found – May use an “overflow” file to decrease time
  • 13. Deletions and Updates to SAM Deletion – Slow: Find the record Either mark for deletion or free up the space Rewrite Updates – Slow: Find the record Make the change Rewrite
  • 14. Binary Search to Find GM (1323) ID Company Industry Symbl. Price Earns. Dividnd. 1122 Exxon Oil XON 46.00 2.50 0.75 1152 Lockheed Aero LCH 112.00 1.25 0.50 1175 Ford Auto F 88.00 1.70 0.20 1231 Intel Comp. INTL 30.00 2.00 0.00 1. 1245 Digital Comp. DEC 120.00 1.80 0.10 3. 1323 GM Auto GM 158.00 2.10 0.30 2. 1378 Texaco Oil TX 230.00 2.80 1.00 1480 Conoco Oil CON 150.00 2.00 0.50 1767 Tony Lama Apparel TONY 45.00 1.50 0.25 Takes 3 accesses as opposed to 6 for linear search.
  • 15. Indexed Sequential Disk (usually) Records physically ordered by primary key Index gives physical location of each record Records accessed sequentially or directly via the index The index is stored in a file and read into memory when the file is opened. Indexes must be maintained
  • 16. Indexed Sequential Access Given a value for the key – search the index for the record address – issue a read instruction for that address – Fast: Possibly just one disk access
  • 17. Indexed Sequential Access: Fast Key Cyl. Trck Sect. 279-66-7549 3 10 2 452-75-6301 3 10 3 789-12-3456 3 10 4 777-13-1212 < 789-12-3456 Search Cyl. 3, Trck 10, Sect. 4 sequentially. 222-66-7634 255-75-5531 279-66-7549 333-88-9876 382-32-0658 452-75-6301 701-43-5634 777-13-1212 789-12-3456 Find record with key 777-13-1212
  • 18. Inserting into ISAM files Not very efficient – Indexes must be updated – Must locate where the record should go – If there is space, insert the new record and rewrite – If no space, use an overflow area – Periodically merge overflow records into file
  • 19. Deletion and Updates for ISAM Fairly efficient – Find the record – Make the change or mark for deletion – Rewrite – Periodically remove records marked for deletion
  • 20. Use ISAM files when: Both sequential and direct access is needed. Say we have a retail application like Foley’s. Customer balances are updated daily. Usually sequential access is more efficient for batch updates. But we may need direct access to answer customer questions about balances.
  • 21. Direct or Hashed Access A portion of disk space is reserved A “hashing” algorithm computes record address Hashing Algorithm 455-72-3566 Address Overflow 376-87-3425 Address
  • 22. Hashed Access Characteristics No indexes to search or maintain Very fast direct access Inefficient sequential access Use when direct access is needed, but sequential access is not.
  • 23. Secondary Indexes Provide access via non-key attributes Three data structures discussed here: – Linked lists: embedded pointers – Inverted lists: cross-index tables – B-Trees
  • 24. Simple Linked List: Oil Industry Head: 1122 ID Company Industry Symbl. Price Earns. Dividnd. Next Rec. 1122 Exxon Oil XON 46.00 2.50 0.75 1378 1152 Lockheed Aero LCH 112.00 1.25 0.50 1175 Ford Auto F 88.00 1.70 0.20 1231 Intel Comp. INTL 30.00 2.00 0.00 1245 Digital Comp. DEC 120.00 1.80 0.10 1323 GM Auto GM 158.00 2.10 0.30 1378 Texaco Oil TX 230.00 2.80 1.00 1480 1480 Conoco Oil CON 150.00 2.00 0.50 -null- 1767 Tony Lama Apparel TONY 45.00 1.50 0.25
  • 25. Ring: Oil Industry Conoco points to the head of the list. Head: 1122 ID Company Industry Symbl. Price Earns. Dividnd. Next Rec. 1122 Exxon Oil XON 46.00 2.50 0.75 1378 1152 Lockheed Aero LCH 112.00 1.25 0.50 1175 Ford Auto F 88.00 1.70 0.20 1231 Intel Comp. INTL 30.00 2.00 0.00 1245 Digital Comp. DEC 120.00 1.80 0.10 1323 GM Auto GM 158.00 2.10 0.30 1378 Texaco Oil TX 230.00 2.80 1.00 1480 1480 Conoco Oil CON 150.00 2.00 0.50 1122 1767 Tony Lama Apparel TONY 45.00 1.50 0.25
  • 26. Types of List Non-dense: Few records (Oil stocks) Dense: Many or all records (Symbol) Two-way: Next and Prior pointers Multi-list: Many lists on the same record type
  • 27. Processing with Lists Assume an Oil industry list and the query: SELECT * FROM STOCKS WHERE INDUSTRY = “Oil” AND EARNINGS > 2.50 Fetch head of oil industry list. Until Next Record Pointer = -null- Fetch oil industry record. If Earnings > 2.50, include on output list. End Until.
  • 28. Problems with lists Difficult to maintain Broken pointer chains Inefficient to search dense lists for one or a few records
  • 29. Inverted List: Industry Values Table Occurrences Table Aero 1152 Auto 1175, 1323 Apparel 1767 Computer 1231, 1245 Oil 1122, 1378, 1480
  • 30. Earnings and Dividends Earnings Dividends Values Occurrences Values Occurrences 1.25 1152 0.00 1231 1.50 1767 0.10 1245 1.70 1175 0.20 1175 2.00 1231, 1480 0.25 1767 2.10 1323 0.30 1323 2.50 1122 0.50 1152 2.80 1378 0.75 1122 1.00 1378
  • 31. Select * From Stocks Where Earnings > 2.00 and Dividends < 0.60 Earnings Dividends Values Occurrences Values Occurrences 1.25 1152 0.00 1231 1.50 1767 0.10 1245 1.70 1175 0.20 1175 2.00 1231, 1480 0.25 1767 2.10 1323 0.30 1323 2.50 1122 0.50 1152 2.80 1378 0.75 1122 1.00 1378 Only one record must be fetched !
  • 32. B-Trees Hierarchical (tree) structure Balanced – all paths from top to bottom the same length Consist of: – Index Set: Indexed access for the key field – Sequence Set: Sequential access for the key field
  • 33. A B-Tree of Order 3 and Height 2 for Stock Symbol Number of levels in the index set Number of values per level minus 1 Level O Index Index Symbol Key Index Symbol Key Index Symbol Key 11 F 1175 12 TONY 1767 13 Index 11 Index 12 Index 13 Symbol Key Symbol Key Symbol Key CON 1480 GM 1323 TX 1378 DEC 1245 INTL 1231 XON 1122 LCH 1152 Sequence Set CON 1245 DEC 1420 F 1175 GM 1323 INTL 1231 LCH 1152 TONY 1767 TX 1378 XON 1122
  • 34. Summary Three file organizations – Heaps – Sorted Sequential Access Method Indexed Sequential Access Method – Direct Access Method Three data structures for secondary keys – Linked lists, Inverted lists, B-trees