Your SlideShare is downloading. ×
File organization techniques
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

File organization techniques

4,636
views

Published on

Published in: Technology, Business

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,636
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Physical Design ConsiderationsFile Organization TechniquesRecord Access MethodsData Structures
  • 2. Physical DesignProvide good performance– Fast response time– Minimum disk accesses
  • 3. Disk AssemblyCylinderTrackPage orSectorAccess time = seek time + rotational delay + transfer time
  • 4. DefinitionsSeek time– Average time to move the read-write headRotational delay– Average time for the sector to move under theread-write headTransfer– Time to read a sector and transfer the data tomemory
  • 5. More DefinitionsLogical Record– The data about an entity (a row in a table)Physical Record– A sector, page or block on the storage mediumTypically several logical records can bestored in one physical record.
  • 6. File Organization TechniquesThree techniques– Heap (unordered)– SortedSequential (SAM)Indexed Sequential (ISAM)– Hashed or Direct
  • 7. Heap FileID Company Industry Symbl. Price Earns. Dividnd.1767 Tony Lama Apparel TONY 45.00 1.50 0.251152 Lockheed Aero LCH 112.00 1.25 0.501175 Ford Auto F 88.00 1.70 0.201122 Exxon Oil XON 46.00 2.50 0.751231 Intel Comp. INTL 30.00 2.00 0.001323 GM Auto GM 158.00 2.10 0.301378 Texaco Oil TX 230.00 2.80 1.001245 Digital Comp. DEC 120.00 1.80 0.10Tony Lama was the first record added,Digital was the last.
  • 8. Heap File CharacteristicsInsertion– Fast: New records added at the end of the fileRetrieval– Slow: A sequential search is requiredUpdate - Delete– Slow:Sequential search to find the pageMake the update or mark for deletionRe-write the page
  • 9. Sequential (Ordered) FileID Company Industry Symbl. Price Earns. Dividnd.1122 Exxon Oil XON 46.00 2.50 0.751152 Lockheed Aero LCH 112.00 1.25 0.501175 Ford Auto F 88.00 1.70 0.201231 Intel Comp. INTL 30.00 2.00 0.001245 Digital Comp. DEC 120.00 1.80 0.101323 GM Auto GM 158.00 2.10 0.301378 Texaco Oil TX 230.00 2.80 1.001480 Conoco Oil CON 150.00 2.00 0.501767 Tony Lama Apparel TONY 45.00 1.50 0.25
  • 10. Sequential Access1122...other data1152 ...1175...1231...
  • 11. Sequential File CharacteristicsOlder media (cards, tapes)Records physically ordered by primary keyUse when direct access to individual recordsis not requiredAccessing records– Sequential search until record is foundBinary search can speed up access– Must know file size and how to determine mid-point,
  • 12. Inserting Records in SAM filesInsertion– Slow:Sequential search to find where the record goesIf sufficient space in that page, then rewriteIf insufficient space, move some records to nextpageIf no space there, keep bumping down until space isfound– May use an “overflow” file to decrease time
  • 13. Deletions and Updates to SAMDeletion– Slow:Find the recordEither mark for deletion or free up the spaceRewriteUpdates– Slow:Find the recordMake the changeRewrite
  • 14. Binary Search to Find GM(1323)ID Company Industry Symbl. Price Earns. Dividnd.1122 Exxon Oil XON 46.00 2.50 0.751152 Lockheed Aero LCH 112.00 1.25 0.501175 Ford Auto F 88.00 1.70 0.201231 Intel Comp. INTL 30.00 2.00 0.001. 1245 Digital Comp. DEC 120.00 1.80 0.103. 1323 GM Auto GM 158.00 2.10 0.302. 1378 Texaco Oil TX 230.00 2.80 1.001480 Conoco Oil CON 150.00 2.00 0.501767 Tony Lama Apparel TONY 45.00 1.50 0.25Takes 3 accesses as opposed to 6 for linear search.
  • 15. Indexed SequentialDisk (usually)Records physically ordered by primary keyIndex gives physical location of each recordRecords accessed sequentially or directlyvia the indexThe index is stored in a file and read intomemory when the file is opened.Indexes must be maintained
  • 16. Indexed Sequential AccessGiven a value for the key– search the index for the record address– issue a read instruction for that address– Fast: Possibly just one disk access
  • 17. Indexed Sequential Access: FastKey Cyl. Trck Sect.279-66-7549 3 10 2452-75-6301 3 10 3789-12-3456 3 10 4777-13-1212 < 789-12-3456Search Cyl. 3, Trck 10, Sect. 4sequentially.222-66-7634255-75-5531279-66-7549333-88-9876382-32-0658452-75-6301701-43-5634777-13-1212789-12-3456Find record with key 777-13-1212
  • 18. Inserting into ISAM filesNot very efficient– Indexes must be updated– Must locate where the record should go– If there is space, insert the new record andrewrite– If no space, use an overflow area– Periodically merge overflow records into file
  • 19. Deletion and Updates for ISAMFairly efficient– Find the record– Make the change or mark for deletion– Rewrite– Periodically remove records marked fordeletion
  • 20. Use ISAM files when:Both sequential and direct access is needed.Say we have a retail application like Foley’s.Customer balances are updated daily.Usually sequential access is more efficient forbatch updates.But we may need direct access to answercustomer questions about balances.
  • 21. Direct or Hashed AccessA portion of disk space is reservedA “hashing” algorithm computes recordaddressHashingAlgorithm455-72-3566AddressOverflow376-87-3425Address
  • 22. Hashed Access CharacteristicsNo indexes to search or maintainVery fast direct accessInefficient sequential accessUse when direct access is needed, butsequential access is not.
  • 23. Secondary IndexesProvide access via non-key attributesThree data structures discussed here:– Linked lists: embedded pointers– Inverted lists: cross-index tables– B-Trees
  • 24. Simple Linked List: Oil IndustryHead: 1122ID Company Industry Symbl. Price Earns. Dividnd. Next Rec.1122 Exxon Oil XON 46.00 2.50 0.75 13781152 Lockheed Aero LCH 112.00 1.25 0.501175 Ford Auto F 88.00 1.70 0.201231 Intel Comp. INTL 30.00 2.00 0.001245 Digital Comp. DEC 120.00 1.80 0.101323 GM Auto GM 158.00 2.10 0.301378 Texaco Oil TX 230.00 2.80 1.00 14801480 Conoco Oil CON 150.00 2.00 0.50 -null-1767 Tony Lama Apparel TONY 45.00 1.50 0.25
  • 25. Ring: Oil IndustryConoco points to the head of the list.Head: 1122ID Company Industry Symbl. Price Earns. Dividnd. Next Rec.1122 Exxon Oil XON 46.00 2.50 0.75 13781152 Lockheed Aero LCH 112.00 1.25 0.501175 Ford Auto F 88.00 1.70 0.201231 Intel Comp. INTL 30.00 2.00 0.001245 Digital Comp. DEC 120.00 1.80 0.101323 GM Auto GM 158.00 2.10 0.301378 Texaco Oil TX 230.00 2.80 1.00 14801480 Conoco Oil CON 150.00 2.00 0.50 11221767 Tony Lama Apparel TONY 45.00 1.50 0.25
  • 26. Types of ListNon-dense: Few records (Oil stocks)Dense: Many or all records (Symbol)Two-way: Next and Prior pointersMulti-list: Many lists on the same recordtype
  • 27. Processing with ListsAssume an Oil industry list and the query:SELECT * FROM STOCKSWHERE INDUSTRY = “Oil” AND EARNINGS > 2.50Fetch head of oil industry list.Until Next Record Pointer = -null-Fetch oil industry record.If Earnings > 2.50, include on output list.End Until.
  • 28. Problems with listsDifficult to maintainBroken pointer chainsInefficient to search dense lists for one or afew records
  • 29. Inverted List: IndustryValues Table Occurrences TableAero 1152Auto 1175, 1323Apparel 1767Computer 1231, 1245Oil 1122, 1378, 1480
  • 30. Earnings and DividendsEarnings DividendsValues Occurrences Values Occurrences1.25 1152 0.00 12311.50 1767 0.10 12451.70 1175 0.20 11752.00 1231, 1480 0.25 17672.10 1323 0.30 13232.50 1122 0.50 11522.80 1378 0.75 11221.00 1378
  • 31. Select * From Stocks Where Earnings > 2.00and Dividends < 0.60Earnings DividendsValues Occurrences Values Occurrences1.25 1152 0.00 12311.50 1767 0.10 12451.70 1175 0.20 11752.00 1231, 1480 0.25 17672.10 1323 0.30 13232.50 1122 0.50 11522.80 1378 0.75 11221.00 1378Only one record must be fetched !
  • 32. B-TreesHierarchical (tree) structureBalanced– all paths from top to bottom the same lengthConsist of:– Index Set: Indexed access for the key field– Sequence Set: Sequential access for the key field
  • 33. A B-Tree of Order 3 and Height 2 for Stock SymbolNumber of levels in the index setNumber of values per level minus 1Level O IndexIndex Symbol Key Index Symbol Key Index Symbol Key11 F 1175 12 TONY 1767 13Index 11 Index 12 Index 13Symbol Key Symbol Key Symbol KeyCON 1480 GM 1323 TX 1378DEC 1245 INTL 1231 XON 1122LCH 1152Sequence SetCON 1245 DEC 1420 F 1175GM 1323 INTL 1231 LCH 1152TONY 1767 TX 1378 XON 1122
  • 34. SummaryThree file organizations– Heaps– SortedSequential Access MethodIndexed Sequential Access Method– Direct Access MethodThree data structures for secondary keys– Linked lists, Inverted lists, B-trees

×