CS: Introduction to Record Manipulation & Indexing

561 views

Published on

An introduction to data record manipulation and indexing.

Originally created 2003 by Katrin Becker
All rights reserved.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
561
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

CS: Introduction to Record Manipulation & Indexing

  1. 1. Record Manipulation & Indexing •records/fields •index placement; index management •manipulating fixed-length record files •re-using space in fixed-length files •varying length records:[VLR] adds; dels; mods; •free lists for VLR - placement strategies (first, best, worst) •varying length record maintenance © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 1
  2. 2. Records in General A record is: • An identifiable, describable data set • Often contains a sub-structure • Typically part of a larger structure This definition also works for: files; fields; … © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 2
  3. 3. Records and Fields FILE SYSTEM containing files FILE containing records RECORD FIELD containing elements containing fields © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 3
  4. 4. Record Manipulation • Operations on Records: – – – – © Katrin Becker All Rights Reserved Searches Additions Deletions Modifications Records and Indexing 14-Sep-03 4
  5. 5. Record Manipulation - Search Sequential Search • While NOT done: – Position file pointer – Read record – Examine record to see if it’s the one • Yes DONE • No CONTINUE © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 5
  6. 6. Other Searches • What changes? – Binary search: • We position the file pointer in a different fashion (the rest is the same) – Search with an index • We apply the search to the index and retrieve the record only when located in the index © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 6
  7. 7. Record Manipulation – Addition New record gets added to the end. • Insertion into middle of file is impractical. • If there is an index, then we also perform an addition to the index (addition to the end of this list is infeasible – WHY? ). © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 7
  8. 8. Addition with an Index - 1 INDEX 1. New record gets added to the end. RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 8
  9. 9. Addition with an Index - 2 INDEX 2. Locate place where index entry needs to go RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 9
  10. 10. Addition with an Index - 3 INDEX 3. Insert New Index entry (it’s a record too) RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 10
  11. 11. Records vs. Index: Assertions & Questions • Moving file records is more expensive than moving index records. • Should index be IN record file or its own file? (How do we maintain it? ) • If IN file: should it be at the beginning, end, middle, distributed? • What if we are able to hold the index in memory? • What if we can’t? © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 11
  12. 12. Record Manipulation - Deletion • Locate record (Search) • Mark space as deleted • Remove index entry? (why or why not) © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 12
  13. 13. Deletion with an index - 1 INDEX 1. Locate index entry RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 13
  14. 14. Deletion with an index - 2 INDEX 1. Locate index entry 2. Locate record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 14
  15. 15. Deletion with an index - 3 INDEX 3. Delete (mark) record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 15
  16. 16. Deletion with an index - 4 INDEX 4. Delete (mark?) index entry RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 16
  17. 17. Record Manipulation - Modification • • • • Locate record Read record Modify record Re-write record (assuming fixed-size records – what if the record is now a different size? [see later]) © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 17
  18. 18. File Behaviour – 1 start Record count = 9 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 18
  19. 19. File Behaviour – 2 add record Record count = 10 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 19
  20. 20. File Behaviour – 3 add record Record count = 11 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 20
  21. 21. File Behaviour – 4 delete Record count = 10 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 21
  22. 22. File Behaviour – 5 delete Record count = 9 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 22
  23. 23. File Behaviour – 6 add Record count = 10 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 23
  24. 24. File Behaviour – 7 add Record count = 11 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 24
  25. 25. File Behaviour – 8 add Record count = 12 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 25
  26. 26. File Behaviour – 9 delete Record count = 11 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 26
  27. 27. File Behaviour – 10 delete Record count = 10 And so on……. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 27
  28. 28. What’s happening to the file? • File grows – does not shrink (we get fragmentation) • We end up covering more ground to do the same job • Q: If we are doing random access, why does it matter? • The file system has less space to use (the fragmentation is internal from the perspective of the file system). • Worst case = EVERY record access ends up costing us a seek. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 28
  29. 29. Re-Using Space in the File [FLR] • When there is a deletion, locate the last record in the file, end move to the free slot – Costs: • Additional file access to locate (where will we remember where the last records is?) and retrieve last record. • Records will loose locality faster than if we simply mark the slot. (Why do we care?) © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 30
  30. 30. Re-Using Space – Way 2 • Make a list of places where records have been deleted. • When doing addition, check for empty ‘slot’ before placing new record at end. Q: What about the index? • When doing deletion, add location of deleted record to ‘free-list’ © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 31
  31. 31. What does the Free-List look like? INDEX All we need is the location. Order is unimportant. RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 32
  32. 32. How to decide which ‘slot’ to re-use? • In FLR every slot will fit a new record. • We can just take the first one – FreeList can then be maintained as a stack (which is easy). • Do we keep Free-List information in the file? © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 33
  33. 33. Indexing – What is it? • Table-of-contents for a file (directory) • Uses keys • Byte Offset (BO) vs Relative Record Number (RRN) © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 34
  34. 34. Primary Key Properties: • • • • Unique Canonical Data-less Unchanging © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 35
  35. 35. Indexing – How does it Look? • Must have: INDEX – Key – Way to locate record • It is itself a structure containing ‘records’ (each index entry is a record) • It may be separate from the main data or in the same file. • It may be copied into memory for manipulation and only updated infrequently; or the file copy may be maintained as well. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 36
  36. 36. Indexing – File Ops? • Tied to records: – If records added – new/update index entry – If record deleted – ‘delete’ index entry – If record modified – maybe no change to index; maybe update BO [byte offset] © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 37
  37. 37. Fixed-length vs Varying Length • • • • • VLR provides greater flexibility. VLR increases maintenance overhead. VLR decreases wasted space. * VLR makes index virtually essential. VLR complicates Free-List maintenance. *may simply waste space in a different place or a different way. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 38
  38. 38. VLR Index INDEX • Requires: – Key – Byte offset – Record size? [optional] RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 39
  39. 39. VLR Search Operation INDEX • • Same as for FLR: 1. Locate key in index 2. Locate record in file Binary search still possible on index, but NOT on records alone. RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 40
  40. 40. VLR Deletion Operation - 1 INDEX Locate key RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 41
  41. 41. VLR Deletion Operation - 2 INDEX Locate record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 42
  42. 42. VLR Deletion Operation - 3 INDEX Delete record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 43
  43. 43. VLR Deletion Operation - 4 INDEX Free-List • • Remember location of ‘slot’ Remember size of slot. RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 44
  44. 44. VLR Deletion Operation - 5 INDEX Free-List 5. Mark index entry RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 45
  45. 45. VLR Addition Operation – 1a INDEX Free-List New Record 1. Search Free-List RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 46
  46. 46. VLR Addition Operation – 1b INDEX Free-List Too Big for first place © Katrin Becker All Rights Reserved New New Record Record RECORDS Records and Indexing 14-Sep-03 47
  47. 47. VLR Addition Operation – 1c INDEX Free-List Too Big for second place New New RECORDS Record Record © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 48
  48. 48. VLR Addition Operation – 1d INDEX Free-List Too Big for third place RECORDS © Katrin Becker All Rights Reserved Records and Indexing New New Record Record 14-Sep-03 49
  49. 49. VLR Addition Operation – 1e INDEX Free-List Place at end of file New Record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 50
  50. 50. VLR Addition Operation – 2a INDEX Free-List New New Record Record Search Free-List RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 51
  51. 51. VLR Addition Operation – 2b INDEX Free-List Fits in first place…. BUT….. New New Record Record © Katrin Becker All Rights Reserved RECORDS Records and Indexing 14-Sep-03 52
  52. 52. VLR Addition Operation – 2c INDEX We will end up with left-over unused (and probably unusable space). We call this “First-Fit” (because we are using the first slot that we find that fits). Free-List New Record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 53
  53. 53. VLR Addition Operation – 2d INDEX If instead we keep looking… We find the second entry is a better fit….. Free-List New Record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 54
  54. 54. VLR Addition Operation – 2e INDEX Free-List The third slot does not fit, so…. RECORDS © Katrin Becker All Rights Reserved Records and Indexing New Record 14-Sep-03 55
  55. 55. VLR Addition Operation – 2f INDEX We decide to use the second slot. It is the Best-Fit Free-List New Record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 56
  56. 56. VLR Addition Operation – 2g INDEX Free-List 1. Insert record. 3. Update Index Notice the index entry is sorted differently. What’s the advantage to leaving ‘spaces’ in the index? 2. Delete FreeList entry. New Record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 57
  57. 57. VLR Modification Operation - 1 • 2 kinds: – 1. Mod results in record remaining same size – 2. Mod results in record growing or shrinking. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 58
  58. 58. VLR Modification Operation - 2 • Mod results in record remaining same size – Same as for FLR © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 59
  59. 59. VLR Modification Operation - 3 • Mod results in record growing or shrinking. – Treat Mod as a deletion followed by an addition. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 60
  60. 60. Free-Lists • May want to keep Free-List sorted. • If the List is short it may not matter. • Placement Strategies: – First Fit – Best Fit – Worst Fit • It could be its own list or we could make the regular index serve double-duty. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 61
  61. 61. Summary • Managing space inside the file is our business. • We must choose: – FLR / VLR? – Index? (what kind?) – Secondary indices? – Re-claim free space? How? © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 62

×