SlideShare a Scribd company logo
Record Manipulation &
Indexing
•records/fields
•index placement; index management
•manipulating fixed-length record files
•re-using space in fixed-length files
•varying length records:[VLR] adds; dels; mods;
•free lists for VLR - placement strategies (first, best, worst)
•varying length record maintenance

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 1
Records in General
A record is:
• An identifiable, describable data set
• Often contains a sub-structure
• Typically part of a larger structure
This definition also works for: files; fields;
…
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 2
Records and Fields
FILE SYSTEM containing files

FILE containing records

RECORD

FIELD containing elements

containing
fields

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 3
Record Manipulation
• Operations on Records:
–
–
–
–

© Katrin Becker
All Rights Reserved

Searches
Additions
Deletions
Modifications

Records and Indexing

14-Sep-03 4
Record Manipulation - Search
Sequential Search
• While NOT done:
– Position file pointer
– Read record
– Examine record to see if it’s the one
• Yes DONE
• No CONTINUE
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 5
Other Searches
• What changes?
– Binary search:
• We position the file pointer in a different
fashion (the rest is the same)

– Search with an index
• We apply the search to the index and retrieve
the record only when located in the index

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 6
Record Manipulation –
Addition
New record gets added to the end.

• Insertion into middle of file is impractical.
• If there is an index, then we also perform
an addition to the index (addition to the
end of this list is infeasible – WHY? ).
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 7
Addition with an Index - 1
INDEX

1. New record gets added to the end.

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 8
Addition with an Index - 2
INDEX

2. Locate place where index entry needs to go

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 9
Addition with an Index - 3
INDEX

3. Insert New Index entry (it’s a record too)

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 10
Records vs. Index:
Assertions & Questions
• Moving file records is more expensive
than moving index records.
• Should index be IN record file or its
own file? (How do we maintain it? )
• If IN file: should it be at the beginning,
end, middle, distributed?
• What if we are able to hold the index in
memory?
• What if we can’t?
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 11
Record Manipulation - Deletion
• Locate record (Search)
• Mark space as deleted
• Remove index entry? (why or why not)

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 12
Deletion with an index - 1
INDEX

1.

Locate index entry

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 13
Deletion with an index - 2
INDEX

1.

Locate index entry

2. Locate record

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 14
Deletion with an index - 3
INDEX

3. Delete (mark) record

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 15
Deletion with an index - 4
INDEX

4. Delete (mark?) index entry

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 16
Record Manipulation - Modification
•
•
•
•

Locate record
Read record
Modify record
Re-write record (assuming fixed-size
records – what if the record is now a
different size? [see later])

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 17
File Behaviour – 1 start

Record count = 9

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 18
File Behaviour – 2 add record

Record count = 10

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 19
File Behaviour – 3 add record

Record count = 11

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 20
File Behaviour – 4 delete

Record count = 10

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 21
File Behaviour – 5 delete

Record count = 9

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 22
File Behaviour – 6 add

Record count = 10

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 23
File Behaviour – 7 add

Record count = 11

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 24
File Behaviour – 8 add

Record count = 12

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 25
File Behaviour – 9 delete

Record count = 11

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 26
File Behaviour – 10 delete

Record count = 10

And so on…….
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 27
What’s happening to the file?
• File grows – does not shrink (we get
fragmentation)
• We end up covering more ground to do the
same job
•

Q: If we are doing random access, why does it matter?

• The file system has less space to use (the
fragmentation is internal from the
perspective of the file system).
• Worst case = EVERY record access ends
up costing us a seek.
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 28
Re-Using Space in the File [FLR]
• When there is a deletion, locate the
last record in the file, end move to the
free slot
– Costs:
• Additional file access to locate (where will we
remember where the last records is?) and
retrieve last record.
• Records will loose locality faster than if we
simply mark the slot. (Why do we care?)

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 30
Re-Using Space – Way 2
• Make a list of places where records
have been deleted.
• When doing addition, check for empty
‘slot’ before placing new record at end.
Q: What about the index?

• When doing deletion, add location of
deleted record to ‘free-list’

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 31
What does the Free-List look like?
INDEX

All we need is
the location.
Order is
unimportant.

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 32
How to decide which ‘slot’
to re-use?
• In FLR every slot will fit a new record.
• We can just take the first one – FreeList can then be maintained as a stack
(which is easy).
• Do we keep Free-List information in
the file?

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 33
Indexing – What is it?
• Table-of-contents for a file (directory)
• Uses keys
• Byte Offset (BO) vs Relative Record
Number (RRN)

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 34
Primary Key Properties:
•
•
•
•

Unique
Canonical
Data-less
Unchanging

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 35
Indexing – How does it Look?
• Must have:
INDEX

– Key
– Way to locate record

• It is itself a structure containing
‘records’ (each index entry is a
record)
• It may be separate from the main
data or in the same file.
• It may be copied into memory for
manipulation and only updated
infrequently; or the file copy may be
maintained as well.
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 36
Indexing – File Ops?
• Tied to records:
– If records added – new/update index entry
– If record deleted – ‘delete’ index entry
– If record modified – maybe no change to
index; maybe update BO [byte offset]

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 37
Fixed-length vs Varying Length
•
•
•
•
•

VLR provides greater flexibility.
VLR increases maintenance overhead.
VLR decreases wasted space. *
VLR makes index virtually essential.
VLR complicates Free-List
maintenance.

*may simply waste space in a different place or a different way.
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 38
VLR Index
INDEX

• Requires:

– Key
– Byte offset
– Record size? [optional]

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 39
VLR Search Operation
INDEX

•
•

Same as for FLR:

1. Locate key in index
2. Locate record in file

Binary search still possible
on index, but NOT on
records alone.

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 40
VLR Deletion Operation - 1
INDEX

Locate key

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 41
VLR Deletion Operation - 2
INDEX

Locate record

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 42
VLR Deletion Operation - 3
INDEX

Delete record

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 43
VLR Deletion Operation - 4
INDEX

Free-List

•
•

Remember location of ‘slot’
Remember size of slot.

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 44
VLR Deletion Operation - 5
INDEX

Free-List

5. Mark index entry

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 45
VLR Addition Operation – 1a
INDEX

Free-List
New
Record

1. Search Free-List

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 46
VLR Addition Operation – 1b
INDEX

Free-List

Too Big for first
place

© Katrin Becker
All Rights Reserved

New
New
Record
Record

RECORDS
Records and Indexing

14-Sep-03 47
VLR Addition Operation – 1c
INDEX

Free-List

Too Big for
second place

New

New
RECORDS
Record
Record

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 48
VLR Addition Operation – 1d
INDEX

Free-List

Too Big for third
place

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

New
New
Record
Record
14-Sep-03 49
VLR Addition Operation – 1e
INDEX

Free-List

Place at end of file

New
Record

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 50
VLR Addition Operation – 2a
INDEX

Free-List

New
New
Record
Record

Search Free-List

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 51
VLR Addition Operation – 2b
INDEX

Free-List

Fits in first place….
BUT…..

New
New
Record
Record
© Katrin Becker
All Rights Reserved

RECORDS
Records and Indexing

14-Sep-03 52
VLR Addition Operation – 2c
INDEX

We will end up with left-over
unused (and probably
unusable space).
We call this “First-Fit”
(because we are using
the first slot that we find
that fits).

Free-List

New
Record

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 53
VLR Addition Operation – 2d
INDEX

If instead we keep
looking…
We find the second
entry is a better
fit…..

Free-List

New
Record

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 54
VLR Addition Operation – 2e
INDEX

Free-List

The third slot does
not fit, so….

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

New
Record

14-Sep-03 55
VLR Addition Operation – 2f
INDEX

We decide to use the
second slot.
It is the Best-Fit

Free-List

New
Record

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 56
VLR Addition Operation – 2g
INDEX

Free-List

1. Insert record.
3. Update Index
Notice the index entry is
sorted differently.
What’s the advantage to
leaving ‘spaces’ in the
index?

2. Delete
FreeList
entry.

New
Record

RECORDS
© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 57
VLR Modification Operation - 1
• 2 kinds:
– 1. Mod results in record remaining same
size
– 2. Mod results in record growing or
shrinking.

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 58
VLR Modification Operation - 2
• Mod results in record remaining same
size
– Same as for FLR

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 59
VLR Modification Operation - 3
• Mod results in record growing or
shrinking.
– Treat Mod as a deletion followed by an
addition.

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 60
Free-Lists
• May want to keep Free-List sorted.
• If the List is short it may not matter.
• Placement Strategies:
– First Fit
– Best Fit
– Worst Fit

• It could be its own list or we could make the
regular index serve double-duty.

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 61
Summary
• Managing space inside the file is our
business.
• We must choose:
– FLR / VLR?
– Index? (what kind?)
– Secondary indices?
– Re-claim free space? How?

© Katrin Becker
All Rights Reserved

Records and Indexing

14-Sep-03 62

More Related Content

Similar to CS: Introduction to Record Manipulation & Indexing

Fundamental file structure concepts & managing files of records
Fundamental file structure concepts & managing files of recordsFundamental file structure concepts & managing files of records
Fundamental file structure concepts & managing files of recordsDevyani Vaidya
 
DFS-Lecture-6 (3).ppt
DFS-Lecture-6 (3).pptDFS-Lecture-6 (3).ppt
DFS-Lecture-6 (3).pptSatvik93
 
16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf3operatordcslipiPeng
 
File organization
File organizationFile organization
File organizationGokul017
 
Trouble-shooting Tips for Primo (2013)
Trouble-shooting Tips for Primo (2013)Trouble-shooting Tips for Primo (2013)
Trouble-shooting Tips for Primo (2013)Alison Hitchens
 
Document and Records Control - Records Management
Document and Records Control - Records ManagementDocument and Records Control - Records Management
Document and Records Control - Records ManagementMelvin Limon
 
Database Management System-Module-IV(part-1).pptx
Database Management System-Module-IV(part-1).pptxDatabase Management System-Module-IV(part-1).pptx
Database Management System-Module-IV(part-1).pptxAiswaryaMohan31
 
Inb343 week2 sql server intro
Inb343 week2 sql server introInb343 week2 sql server intro
Inb343 week2 sql server introFredlive503
 
Apache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysisApache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysisliang chen
 
5 data storage_and_indexing
5 data storage_and_indexing5 data storage_and_indexing
5 data storage_and_indexingUtkarsh De
 

Similar to CS: Introduction to Record Manipulation & Indexing (12)

Fundamental file structure concepts & managing files of records
Fundamental file structure concepts & managing files of recordsFundamental file structure concepts & managing files of records
Fundamental file structure concepts & managing files of records
 
DFS-Lecture-6 (3).ppt
DFS-Lecture-6 (3).pptDFS-Lecture-6 (3).ppt
DFS-Lecture-6 (3).ppt
 
16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf
 
File organization
File organizationFile organization
File organization
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Trouble-shooting Tips for Primo (2013)
Trouble-shooting Tips for Primo (2013)Trouble-shooting Tips for Primo (2013)
Trouble-shooting Tips for Primo (2013)
 
Document and Records Control - Records Management
Document and Records Control - Records ManagementDocument and Records Control - Records Management
Document and Records Control - Records Management
 
Database Management System-Module-IV(part-1).pptx
Database Management System-Module-IV(part-1).pptxDatabase Management System-Module-IV(part-1).pptx
Database Management System-Module-IV(part-1).pptx
 
Database File operation
Database File operationDatabase File operation
Database File operation
 
Inb343 week2 sql server intro
Inb343 week2 sql server introInb343 week2 sql server intro
Inb343 week2 sql server intro
 
Apache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysisApache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysis
 
5 data storage_and_indexing
5 data storage_and_indexing5 data storage_and_indexing
5 data storage_and_indexing
 

More from Katrin Becker

Cross breeding animation
Cross breeding animationCross breeding animation
Cross breeding animationKatrin Becker
 
Assignments that Meet the Needs of Exceptional Students without Disadvantagin...
Assignments that Meet the Needs of Exceptional Students without Disadvantagin...Assignments that Meet the Needs of Exceptional Students without Disadvantagin...
Assignments that Meet the Needs of Exceptional Students without Disadvantagin...Katrin Becker
 
T.A.P. : The Teach Aloud Protocol
T.A.P. : The Teach Aloud ProtocolT.A.P. : The Teach Aloud Protocol
T.A.P. : The Teach Aloud ProtocolKatrin Becker
 
Misguided illusions of understanding
Misguided illusions of understandingMisguided illusions of understanding
Misguided illusions of understandingKatrin Becker
 
4 Pillars of DGBL: A Structured Rating System for Games for Learning
4 Pillars of DGBL: A Structured Rating System for Games for Learning4 Pillars of DGBL: A Structured Rating System for Games for Learning
4 Pillars of DGBL: A Structured Rating System for Games for LearningKatrin Becker
 
Gamification paradigm
Gamification paradigmGamification paradigm
Gamification paradigmKatrin Becker
 
The Calm and The Storm: Simulation and Games - Why All Games are Simulations ...
The Calm and The Storm: Simulation and Games - Why All Games are Simulations ...The Calm and The Storm: Simulation and Games - Why All Games are Simulations ...
The Calm and The Storm: Simulation and Games - Why All Games are Simulations ...Katrin Becker
 
Gamification how to gamify learning and instruction Part 1 (of 3)
Gamification how to gamify learning and instruction Part 1 (of 3)Gamification how to gamify learning and instruction Part 1 (of 3)
Gamification how to gamify learning and instruction Part 1 (of 3)Katrin Becker
 
Gamification how to gamify learning and instruction, part 2 (of 3)
Gamification how to gamify learning and instruction, part 2 (of 3)Gamification how to gamify learning and instruction, part 2 (of 3)
Gamification how to gamify learning and instruction, part 2 (of 3)Katrin Becker
 
Is gamification a game changer
Is gamification a game changerIs gamification a game changer
Is gamification a game changerKatrin Becker
 
CS Example: Parsing a Sentence
CS Example: Parsing a Sentence CS Example: Parsing a Sentence
CS Example: Parsing a Sentence Katrin Becker
 
CS Lesson: Introduction to the Java virtual Machine
CS Lesson: Introduction to the Java virtual MachineCS Lesson: Introduction to the Java virtual Machine
CS Lesson: Introduction to the Java virtual MachineKatrin Becker
 
CS Lesson: Creating Your First Class in Java
CS Lesson: Creating Your First Class in JavaCS Lesson: Creating Your First Class in Java
CS Lesson: Creating Your First Class in JavaKatrin Becker
 
Informing pedagogy through collaborative inquiry
Informing pedagogy through collaborative inquiryInforming pedagogy through collaborative inquiry
Informing pedagogy through collaborative inquiryKatrin Becker
 
Informing SoTL using playtesting techniques
Informing SoTL using playtesting techniquesInforming SoTL using playtesting techniques
Informing SoTL using playtesting techniquesKatrin Becker
 
Using cards games as learning objects to teach genetics
Using cards games as learning objects to teach geneticsUsing cards games as learning objects to teach genetics
Using cards games as learning objects to teach geneticsKatrin Becker
 
Gamification how to gamify learning and instruction, Part 3 (of 3)
Gamification how to gamify learning and instruction, Part 3 (of 3)Gamification how to gamify learning and instruction, Part 3 (of 3)
Gamification how to gamify learning and instruction, Part 3 (of 3)Katrin Becker
 
The decorative media trap
The decorative media trapThe decorative media trap
The decorative media trapKatrin Becker
 

More from Katrin Becker (20)

Cross breeding animation
Cross breeding animationCross breeding animation
Cross breeding animation
 
Assignments that Meet the Needs of Exceptional Students without Disadvantagin...
Assignments that Meet the Needs of Exceptional Students without Disadvantagin...Assignments that Meet the Needs of Exceptional Students without Disadvantagin...
Assignments that Meet the Needs of Exceptional Students without Disadvantagin...
 
T.A.P. : The Teach Aloud Protocol
T.A.P. : The Teach Aloud ProtocolT.A.P. : The Teach Aloud Protocol
T.A.P. : The Teach Aloud Protocol
 
Misguided illusions of understanding
Misguided illusions of understandingMisguided illusions of understanding
Misguided illusions of understanding
 
Signature pedagogy
Signature pedagogySignature pedagogy
Signature pedagogy
 
Virtue of Failure
Virtue of FailureVirtue of Failure
Virtue of Failure
 
4 Pillars of DGBL: A Structured Rating System for Games for Learning
4 Pillars of DGBL: A Structured Rating System for Games for Learning4 Pillars of DGBL: A Structured Rating System for Games for Learning
4 Pillars of DGBL: A Structured Rating System for Games for Learning
 
Gamification paradigm
Gamification paradigmGamification paradigm
Gamification paradigm
 
The Calm and The Storm: Simulation and Games - Why All Games are Simulations ...
The Calm and The Storm: Simulation and Games - Why All Games are Simulations ...The Calm and The Storm: Simulation and Games - Why All Games are Simulations ...
The Calm and The Storm: Simulation and Games - Why All Games are Simulations ...
 
Gamification how to gamify learning and instruction Part 1 (of 3)
Gamification how to gamify learning and instruction Part 1 (of 3)Gamification how to gamify learning and instruction Part 1 (of 3)
Gamification how to gamify learning and instruction Part 1 (of 3)
 
Gamification how to gamify learning and instruction, part 2 (of 3)
Gamification how to gamify learning and instruction, part 2 (of 3)Gamification how to gamify learning and instruction, part 2 (of 3)
Gamification how to gamify learning and instruction, part 2 (of 3)
 
Is gamification a game changer
Is gamification a game changerIs gamification a game changer
Is gamification a game changer
 
CS Example: Parsing a Sentence
CS Example: Parsing a Sentence CS Example: Parsing a Sentence
CS Example: Parsing a Sentence
 
CS Lesson: Introduction to the Java virtual Machine
CS Lesson: Introduction to the Java virtual MachineCS Lesson: Introduction to the Java virtual Machine
CS Lesson: Introduction to the Java virtual Machine
 
CS Lesson: Creating Your First Class in Java
CS Lesson: Creating Your First Class in JavaCS Lesson: Creating Your First Class in Java
CS Lesson: Creating Your First Class in Java
 
Informing pedagogy through collaborative inquiry
Informing pedagogy through collaborative inquiryInforming pedagogy through collaborative inquiry
Informing pedagogy through collaborative inquiry
 
Informing SoTL using playtesting techniques
Informing SoTL using playtesting techniquesInforming SoTL using playtesting techniques
Informing SoTL using playtesting techniques
 
Using cards games as learning objects to teach genetics
Using cards games as learning objects to teach geneticsUsing cards games as learning objects to teach genetics
Using cards games as learning objects to teach genetics
 
Gamification how to gamify learning and instruction, Part 3 (of 3)
Gamification how to gamify learning and instruction, Part 3 (of 3)Gamification how to gamify learning and instruction, Part 3 (of 3)
Gamification how to gamify learning and instruction, Part 3 (of 3)
 
The decorative media trap
The decorative media trapThe decorative media trap
The decorative media trap
 

Recently uploaded

Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesThousandEyes
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 

Recently uploaded (20)

Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 

CS: Introduction to Record Manipulation & Indexing

  • 1. Record Manipulation & Indexing •records/fields •index placement; index management •manipulating fixed-length record files •re-using space in fixed-length files •varying length records:[VLR] adds; dels; mods; •free lists for VLR - placement strategies (first, best, worst) •varying length record maintenance © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 1
  • 2. Records in General A record is: • An identifiable, describable data set • Often contains a sub-structure • Typically part of a larger structure This definition also works for: files; fields; … © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 2
  • 3. Records and Fields FILE SYSTEM containing files FILE containing records RECORD FIELD containing elements containing fields © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 3
  • 4. Record Manipulation • Operations on Records: – – – – © Katrin Becker All Rights Reserved Searches Additions Deletions Modifications Records and Indexing 14-Sep-03 4
  • 5. Record Manipulation - Search Sequential Search • While NOT done: – Position file pointer – Read record – Examine record to see if it’s the one • Yes DONE • No CONTINUE © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 5
  • 6. Other Searches • What changes? – Binary search: • We position the file pointer in a different fashion (the rest is the same) – Search with an index • We apply the search to the index and retrieve the record only when located in the index © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 6
  • 7. Record Manipulation – Addition New record gets added to the end. • Insertion into middle of file is impractical. • If there is an index, then we also perform an addition to the index (addition to the end of this list is infeasible – WHY? ). © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 7
  • 8. Addition with an Index - 1 INDEX 1. New record gets added to the end. RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 8
  • 9. Addition with an Index - 2 INDEX 2. Locate place where index entry needs to go RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 9
  • 10. Addition with an Index - 3 INDEX 3. Insert New Index entry (it’s a record too) RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 10
  • 11. Records vs. Index: Assertions & Questions • Moving file records is more expensive than moving index records. • Should index be IN record file or its own file? (How do we maintain it? ) • If IN file: should it be at the beginning, end, middle, distributed? • What if we are able to hold the index in memory? • What if we can’t? © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 11
  • 12. Record Manipulation - Deletion • Locate record (Search) • Mark space as deleted • Remove index entry? (why or why not) © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 12
  • 13. Deletion with an index - 1 INDEX 1. Locate index entry RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 13
  • 14. Deletion with an index - 2 INDEX 1. Locate index entry 2. Locate record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 14
  • 15. Deletion with an index - 3 INDEX 3. Delete (mark) record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 15
  • 16. Deletion with an index - 4 INDEX 4. Delete (mark?) index entry RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 16
  • 17. Record Manipulation - Modification • • • • Locate record Read record Modify record Re-write record (assuming fixed-size records – what if the record is now a different size? [see later]) © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 17
  • 18. File Behaviour – 1 start Record count = 9 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 18
  • 19. File Behaviour – 2 add record Record count = 10 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 19
  • 20. File Behaviour – 3 add record Record count = 11 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 20
  • 21. File Behaviour – 4 delete Record count = 10 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 21
  • 22. File Behaviour – 5 delete Record count = 9 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 22
  • 23. File Behaviour – 6 add Record count = 10 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 23
  • 24. File Behaviour – 7 add Record count = 11 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 24
  • 25. File Behaviour – 8 add Record count = 12 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 25
  • 26. File Behaviour – 9 delete Record count = 11 © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 26
  • 27. File Behaviour – 10 delete Record count = 10 And so on……. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 27
  • 28. What’s happening to the file? • File grows – does not shrink (we get fragmentation) • We end up covering more ground to do the same job • Q: If we are doing random access, why does it matter? • The file system has less space to use (the fragmentation is internal from the perspective of the file system). • Worst case = EVERY record access ends up costing us a seek. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 28
  • 29. Re-Using Space in the File [FLR] • When there is a deletion, locate the last record in the file, end move to the free slot – Costs: • Additional file access to locate (where will we remember where the last records is?) and retrieve last record. • Records will loose locality faster than if we simply mark the slot. (Why do we care?) © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 30
  • 30. Re-Using Space – Way 2 • Make a list of places where records have been deleted. • When doing addition, check for empty ‘slot’ before placing new record at end. Q: What about the index? • When doing deletion, add location of deleted record to ‘free-list’ © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 31
  • 31. What does the Free-List look like? INDEX All we need is the location. Order is unimportant. RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 32
  • 32. How to decide which ‘slot’ to re-use? • In FLR every slot will fit a new record. • We can just take the first one – FreeList can then be maintained as a stack (which is easy). • Do we keep Free-List information in the file? © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 33
  • 33. Indexing – What is it? • Table-of-contents for a file (directory) • Uses keys • Byte Offset (BO) vs Relative Record Number (RRN) © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 34
  • 34. Primary Key Properties: • • • • Unique Canonical Data-less Unchanging © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 35
  • 35. Indexing – How does it Look? • Must have: INDEX – Key – Way to locate record • It is itself a structure containing ‘records’ (each index entry is a record) • It may be separate from the main data or in the same file. • It may be copied into memory for manipulation and only updated infrequently; or the file copy may be maintained as well. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 36
  • 36. Indexing – File Ops? • Tied to records: – If records added – new/update index entry – If record deleted – ‘delete’ index entry – If record modified – maybe no change to index; maybe update BO [byte offset] © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 37
  • 37. Fixed-length vs Varying Length • • • • • VLR provides greater flexibility. VLR increases maintenance overhead. VLR decreases wasted space. * VLR makes index virtually essential. VLR complicates Free-List maintenance. *may simply waste space in a different place or a different way. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 38
  • 38. VLR Index INDEX • Requires: – Key – Byte offset – Record size? [optional] RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 39
  • 39. VLR Search Operation INDEX • • Same as for FLR: 1. Locate key in index 2. Locate record in file Binary search still possible on index, but NOT on records alone. RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 40
  • 40. VLR Deletion Operation - 1 INDEX Locate key RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 41
  • 41. VLR Deletion Operation - 2 INDEX Locate record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 42
  • 42. VLR Deletion Operation - 3 INDEX Delete record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 43
  • 43. VLR Deletion Operation - 4 INDEX Free-List • • Remember location of ‘slot’ Remember size of slot. RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 44
  • 44. VLR Deletion Operation - 5 INDEX Free-List 5. Mark index entry RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 45
  • 45. VLR Addition Operation – 1a INDEX Free-List New Record 1. Search Free-List RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 46
  • 46. VLR Addition Operation – 1b INDEX Free-List Too Big for first place © Katrin Becker All Rights Reserved New New Record Record RECORDS Records and Indexing 14-Sep-03 47
  • 47. VLR Addition Operation – 1c INDEX Free-List Too Big for second place New New RECORDS Record Record © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 48
  • 48. VLR Addition Operation – 1d INDEX Free-List Too Big for third place RECORDS © Katrin Becker All Rights Reserved Records and Indexing New New Record Record 14-Sep-03 49
  • 49. VLR Addition Operation – 1e INDEX Free-List Place at end of file New Record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 50
  • 50. VLR Addition Operation – 2a INDEX Free-List New New Record Record Search Free-List RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 51
  • 51. VLR Addition Operation – 2b INDEX Free-List Fits in first place…. BUT….. New New Record Record © Katrin Becker All Rights Reserved RECORDS Records and Indexing 14-Sep-03 52
  • 52. VLR Addition Operation – 2c INDEX We will end up with left-over unused (and probably unusable space). We call this “First-Fit” (because we are using the first slot that we find that fits). Free-List New Record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 53
  • 53. VLR Addition Operation – 2d INDEX If instead we keep looking… We find the second entry is a better fit….. Free-List New Record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 54
  • 54. VLR Addition Operation – 2e INDEX Free-List The third slot does not fit, so…. RECORDS © Katrin Becker All Rights Reserved Records and Indexing New Record 14-Sep-03 55
  • 55. VLR Addition Operation – 2f INDEX We decide to use the second slot. It is the Best-Fit Free-List New Record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 56
  • 56. VLR Addition Operation – 2g INDEX Free-List 1. Insert record. 3. Update Index Notice the index entry is sorted differently. What’s the advantage to leaving ‘spaces’ in the index? 2. Delete FreeList entry. New Record RECORDS © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 57
  • 57. VLR Modification Operation - 1 • 2 kinds: – 1. Mod results in record remaining same size – 2. Mod results in record growing or shrinking. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 58
  • 58. VLR Modification Operation - 2 • Mod results in record remaining same size – Same as for FLR © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 59
  • 59. VLR Modification Operation - 3 • Mod results in record growing or shrinking. – Treat Mod as a deletion followed by an addition. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 60
  • 60. Free-Lists • May want to keep Free-List sorted. • If the List is short it may not matter. • Placement Strategies: – First Fit – Best Fit – Worst Fit • It could be its own list or we could make the regular index serve double-duty. © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 61
  • 61. Summary • Managing space inside the file is our business. • We must choose: – FLR / VLR? – Index? (what kind?) – Secondary indices? – Re-claim free space? How? © Katrin Becker All Rights Reserved Records and Indexing 14-Sep-03 62