CS: Introduction to Record Manipulation & Indexing
Upcoming SlideShare
Loading in...5
×
 

CS: Introduction to Record Manipulation & Indexing

on

  • 333 views

An introduction to data record manipulation and indexing.

An introduction to data record manipulation and indexing.

Originally created 2003 by Katrin Becker
All rights reserved.

Statistics

Views

Total Views
333
Views on SlideShare
333
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    CS: Introduction to Record Manipulation & Indexing CS: Introduction to Record Manipulation & Indexing Presentation Transcript

    • Record Manipulation & Indexing •records/fields •index placement; index management •manipulating fixed-length record files •re-using space in fixed-length files •varying length records:[VLR] adds; dels; mods; •free lists for VLR - placement strategies (first, best, worst) •varying length record maintenance © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 1
    • Records in General A record is: • An identifiable, describable data set • Often contains a sub-structure • Typically part of a larger structure This definition also works for: files; fields; … © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 2
    • Records and Fields FILE SYSTEM containing files FILE containing records RECORD FIELD containing elements containing fields © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 3
    • Record Manipulation • Operations on Records: – – – – © Katrin Becker All Rights Reserved Searches Additions Deletions Modifications Records and Indexing 14-Sep-03 4
    • Record Manipulation - Search Sequential Search • While NOT done: – Position file pointer – Read record – Examine record to see if it’s the one • Yes DONE • No CONTINUE © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 5
    • Other Searches • What changes? – Binary search: • We position the file pointer in a different fashion (the rest is the same) – Search with an index • We apply the search to the index and retrieve the record only when located in the index © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 6
    • Record Manipulation – Addition New record gets added to the end. • Insertion into middle of file is impractical. • If there is an index, then we also perform an addition to the index (addition to the end of this list is infeasible – WHY? ). © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 7
    • Addition with an Index - 1 INDEX 1. New record gets added to the end. RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 8
    • Addition with an Index - 2 INDEX 2. Locate place where index entry needs to go RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 9
    • Addition with an Index - 3 INDEX 3. Insert New Index entry (it’s a record too) RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 10
    • Records vs. Index: Assertions & Questions • Moving file records is more expensive than moving index records. • Should index be IN record file or its own file? (How do we maintain it? ) • If IN file: should it be at the beginning, end, middle, distributed? • What if we are able to hold the index in memory? • What if we can’t? © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 11
    • Record Manipulation - Deletion • Locate record (Search) • Mark space as deleted • Remove index entry? (why or why not) © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 12
    • Deletion with an index - 1 INDEX 1. Locate index entry RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 13
    • Deletion with an index - 2 INDEX 1. Locate index entry 2. Locate record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 14
    • Deletion with an index - 3 INDEX 3. Delete (mark) record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 15
    • Deletion with an index - 4 INDEX 4. Delete (mark?) index entry RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 16
    • Record Manipulation - Modification • • • • Locate record Read record Modify record Re-write record (assuming fixed-size records – what if the record is now a different size? [see later]) © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 17
    • File Behaviour – 1 start Record count = 9 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 18
    • File Behaviour – 2 add record Record count = 10 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 19
    • File Behaviour – 3 add record Record count = 11 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 20
    • File Behaviour – 4 delete Record count = 10 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 21
    • File Behaviour – 5 delete Record count = 9 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 22
    • File Behaviour – 6 add Record count = 10 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 23
    • File Behaviour – 7 add Record count = 11 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 24
    • File Behaviour – 8 add Record count = 12 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 25
    • File Behaviour – 9 delete Record count = 11 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 26
    • File Behaviour – 10 delete Record count = 10 And so on……. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 27
    • What’s happening to the file? • File grows – does not shrink (we get fragmentation) • We end up covering more ground to do the same job • Q: If we are doing random access, why does it matter? • The file system has less space to use (the fragmentation is internal from the perspective of the file system). • Worst case = EVERY record access ends up costing us a seek. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 28
    • Re-Using Space in the File [FLR] • When there is a deletion, locate the last record in the file, end move to the free slot – Costs: • Additional file access to locate (where will we remember where the last records is?) and retrieve last record. • Records will loose locality faster than if we simply mark the slot. (Why do we care?) © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 30
    • Re-Using Space – Way 2 • Make a list of places where records have been deleted. • When doing addition, check for empty ‘slot’ before placing new record at end. Q: What about the index? • When doing deletion, add location of deleted record to ‘free-list’ © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 31
    • What does the Free-List look like? INDEX All we need is the location. Order is unimportant. RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 32
    • How to decide which ‘slot’ to re-use? • In FLR every slot will fit a new record. • We can just take the first one – FreeList can then be maintained as a stack (which is easy). • Do we keep Free-List information in the file? © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 33
    • Indexing – What is it? • Table-of-contents for a file (directory) • Uses keys • Byte Offset (BO) vs Relative Record Number (RRN) © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 34
    • Primary Key Properties: • • • • Unique Canonical Data-less Unchanging © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 35
    • Indexing – How does it Look? • Must have: INDEX – Key – Way to locate record • It is itself a structure containing ‘records’ (each index entry is a record) • It may be separate from the main data or in the same file. • It may be copied into memory for manipulation and only updated infrequently; or the file copy may be maintained as well. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 36
    • Indexing – File Ops? • Tied to records: – If records added – new/update index entry – If record deleted – ‘delete’ index entry – If record modified – maybe no change to index; maybe update BO [byte offset] © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 37
    • Fixed-length vs Varying Length • • • • • VLR provides greater flexibility. VLR increases maintenance overhead. VLR decreases wasted space. * VLR makes index virtually essential. VLR complicates Free-List maintenance. *may simply waste space in a different place or a different way. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 38
    • VLR Index INDEX • Requires: – Key – Byte offset – Record size? [optional] RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 39
    • VLR Search Operation INDEX • • Same as for FLR: 1. Locate key in index 2. Locate record in file Binary search still possible on index, but NOT on records alone. RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 40
    • VLR Deletion Operation - 1 INDEX Locate key RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 41
    • VLR Deletion Operation - 2 INDEX Locate record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 42
    • VLR Deletion Operation - 3 INDEX Delete record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 43
    • VLR Deletion Operation - 4 INDEX Free-List • • Remember location of ‘slot’ Remember size of slot. RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 44
    • VLR Deletion Operation - 5 INDEX Free-List 5. Mark index entry RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 45
    • VLR Addition Operation – 1a INDEX Free-List New Record 1. Search Free-List RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 46
    • VLR Addition Operation – 1b INDEX Free-List Too Big for first place © Katrin Becker All Rights Reserved New New Record Record RECORDS Records and Indexing 14-Sep-03 47
    • VLR Addition Operation – 1c INDEX Free-List Too Big for second place New New RECORDS Record Record © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 48
    • VLR Addition Operation – 1d INDEX Free-List Too Big for third place RECORDS © Katrin Becker All Rights Reserved Records and Indexing New New Record Record 14-Sep-03 49
    • VLR Addition Operation – 1e INDEX Free-List Place at end of file New Record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 50
    • VLR Addition Operation – 2a INDEX Free-List New New Record Record Search Free-List RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 51
    • VLR Addition Operation – 2b INDEX Free-List Fits in first place…. BUT….. New New Record Record © Katrin Becker All Rights Reserved RECORDS Records and Indexing 14-Sep-03 52
    • VLR Addition Operation – 2c INDEX We will end up with left-over unused (and probably unusable space). We call this “First-Fit” (because we are using the first slot that we find that fits). Free-List New Record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 53
    • VLR Addition Operation – 2d INDEX If instead we keep looking… We find the second entry is a better fit….. Free-List New Record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 54
    • VLR Addition Operation – 2e INDEX Free-List The third slot does not fit, so…. RECORDS © Katrin Becker All Rights Reserved Records and Indexing New Record 14-Sep-03 55
    • VLR Addition Operation – 2f INDEX We decide to use the second slot. It is the Best-Fit Free-List New Record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 56
    • VLR Addition Operation – 2g INDEX Free-List 1. Insert record. 3. Update Index Notice the index entry is sorted differently. What’s the advantage to leaving ‘spaces’ in the index? 2. Delete FreeList entry. New Record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 57
    • VLR Modification Operation - 1 • 2 kinds: – 1. Mod results in record remaining same size – 2. Mod results in record growing or shrinking. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 58
    • VLR Modification Operation - 2 • Mod results in record remaining same size – Same as for FLR © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 59
    • VLR Modification Operation - 3 • Mod results in record growing or shrinking. – Treat Mod as a deletion followed by an addition. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 60
    • Free-Lists • May want to keep Free-List sorted. • If the List is short it may not matter. • Placement Strategies: – First Fit – Best Fit – Worst Fit • It could be its own list or we could make the regular index serve double-duty. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 61
    • Summary • Managing space inside the file is our business. • We must choose: – FLR / VLR? – Index? (what kind?) – Secondary indices? – Re-claim free space? How? © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 62