SlideShare a Scribd company logo
1 of 31
Download to read offline
1 
Storage in Database Systems 
CMPSCI 445 
Fall 2010
Storage Topics 
 Architecture and Overview 
 Disks 
 Buffer management 
 Files of records 
2
3 
DBMS Architecture 
Query Parser 
Query Rewriter 
Query Optimizer 
Query Executor 
File & Access Methods 
Buffer Manager 
Lock Manager Log Manager 
Disk Space Manager
4 
Data on External Storage 
 Disks: Can retrieve random page at fixed cost 
 But reading several consecutive pages is much cheaper than 
reading them in random order 
 Tapes: Can only read pages in sequence 
 Cheaper than disks; used for archival storage 
 Page: Unit of information read from or written to disk 
 Size of page: DBMS parameter, 4KB, 8KB 
 Disk space manager: 
 Abstraction: a collection of pages. 
 Allocate/de-allocate a page. 
 Read/write a page. 
 Page I/O: 
 Pages read from disk and pages written to disk 
 Dominant cost of database operations
5 
Buffer Management 
 Architecture: 
 Data is read into memory for processing 
 Data is written to disk for persistent storage 
 Buffer manager stages pages between external 
storage and main memory buffer pool. 
 Access method layer makes calls to the buffer 
manager.
6 
Access Methods 
 Access methods: routines to manage various disk-based 
data structures. 
 Files of records 
 Various kinds of indexes 
 File of records: 
 Important abstraction of external storage in a DBMS! 
 Record id (rid) is sufficient to physically locate a record 
 Indexes: 
 Auxiliary data structures 
 Given values in index search key fields, find the record ids of 
records with those values
7 
File organizations & access methods 
Many alternatives exist, each ideal for some 
situations, and not so good in others: 
 Heap (unordered) files: Suitable when typical 
access is a file scan retrieving all records. 
 Sorted Files: Best if records must be retrieved in 
some order, or only a `range’ of records is needed. 
 Indexes: Data structures to organize records via 
trees or hashing. 
• Like sorted files, they speed up searches for a subset of 
records, based on values in certain (“search key”) fields 
• Updates are much faster than in sorted files.
Disks 
8
9 
Disks and DBMS Design 
 DBMS stores information on disks. 
 This has major implications for DBMS design! 
 READ: transfer data from disk to main memory (RAM) 
for data processing. 
 WRITE: transfer data from RAM to disk for persistent 
storage. 
 Both are high-cost operations, relative to in-memory 
operations, so must be planned carefully!
10 
Why Not Store Everything in Main Memory? 
 Main memory is volatile. We want data to be 
saved between runs. (Obviously!) 
 Costs too much. $100 will buy you either 4GB 
of RAM or 1TB of disk today. 
 32-bit addressing limitation. 
 232 bytes can be directly addressed in memory. 
 Number of objects cannot exceed this number.
11 
Basics of Disks 
 Unit of storage and retrieval: disk block or page. 
 A disk block/page is a contiguous sequence of bytes. 
 Size of a DBMS parameter, 4KB or 8KB. 
 Disks support direct access to a page. 
 Unlike RAM, time to retrieve a page varies! 
 It depends upon the location on disk. 
 Therefore, relative placement of pages on disk has 
major impact on DBMS performance!
12 
Components of a Disk 
Track 
Platters 
 Platters spin (say, 7200rpm). 
Spindle 
 Arm assembly is moved in 
or out to position a head on a 
desired track. 
Disk head 
Arm movement 
Arm 
assembly 
 Only one head reads/ 
writes at any one time. 
 Tracks under heads make a 
cylinder (imaginary!). 
 Each track is divided into 
sectors (whose size is fixed). 
 Block size is a multiple 
of sector size. 
Sector
13 
Accessing a Disk Page 
 Time to access (read/write) a disk block: 
 seek time (moving arms to position disk head on track) 
 rotational delay (waiting for block to rotate under head) 
 transfer time (actually moving data to/from disk surface) 
 Seek time and rotational delay dominate. 
 Seek time varies from about 2 to 30 msec 
 Rotational delay varies from 0 to 11 msec 
 Transfer rate between 25 to 100 MB/s 
 Key to lower I/O cost: reduce seek/rotation 
delays!
14 
Arranging Pages on Disk 
 `Next’ block concept: 
 blocks on same track, followed by 
 blocks on same cylinder, followed by 
 blocks on adjacent cylinder 
 Blocks in a file should be arranged 
sequentially on disk (by `next’), to minimize 
seek and rotational delay. 
 For a sequential scan, pre-fetching several 
pages at a time is a big win!
Buffer Management 
15
16 
Buffer Management in a DBMS 
Page Requests from Higher Levels 
BUFFER POOL 
DB 
disk page 
free frame 
MAIN MEMORY 
DISK 
choice of frame dictated 
by replacement policy 
 Data must be in RAM for DBMS to operate on it! 
 Table of <frame#, pageid> pairs is maintained.
17 
More on Buffer Management 
 Requestor of page must unpin it, and indicate 
whether page has been modified: 
 dirty bit is used for this. 
 Page in pool may be requested many times, 
 a pin count is used. A page is a candidate for 
replacement iff pin count = 0. 
 CC & recovery may entail additional I/O 
when a frame is chosen for replacement. 
(Write-Ahead Log protocol; more later.)
18 
When a Page is Requested ... 
 If requested page is not in pool: 
 Choose a frame for replacement 
 If frame is dirty, write it to disk 
 Read requested page into chosen frame 
 Pin the page and return its address. 
 If requests can be predicted (e.g., sequential scans) 
pages can be pre-fetched several pages at a time!
19 
Buffer Replacement Policy 
 Frame is chosen for replacement by a 
replacement policy: 
 Least-recently-used (LRU), Clock, MRU etc. 
 Policy can have big impact on # of I/O’s; 
depends on the access pattern. 
 Sequential flooding: Nasty situation caused by 
LRU + repeated sequential scans. 
 # buffer frames < # pages in file means each page 
request causes an I/O. MRU much better in this 
situation (but not in all situations, of course).
20 
DBMS vs. OS File System 
OS does disk space & buffer mgmt: why not let 
OS manage these tasks? 
 Differences in OS support: portability issues 
 Some limitations, e.g., files can’t span disks. 
 Buffer management in DBMS requires ability to: 
 pin a page in buffer pool, force a page to disk 
(important for implementing CC & recovery), 
 adjust replacement policy, and pre-fetch pages based 
on access patterns in typical DB operations.
Files 
21
22 
Files 
 Access method layer offers an abstraction of 
data on disk: a file of records residing on 
multiple pages 
 A number of fields are organized in a record 
 A collection of records are organized in a page 
 A collection of pages are organized in a file
23 
Files of Records 
 Page or block is OK when doing I/O, but 
higher levels of DBMS operate on records and 
files of records. 
 FILE: A collection of pages, each containing a 
collection of records. Must support: 
 insert/delete/modify record 
 read a particular record (specified using record id) 
 scan all records (possibly with some conditions on 
the records to be retrieved)
24 
Unordered (Heap) Files 
 Simplest file structure contains records in no 
particular order. 
 As file grows and shrinks, disk pages are 
allocated and de-allocated. 
 To support record level operations, we must: 
 keep track of the pages in a file 
 keep track of free space on pages 
 keep track of the records on a page 
 There are many alternatives for keeping track 
of this.
25 
Heap File Implemented as a List 
Header 
Page 
Data 
Page 
Data 
Page 
Data 
Page 
Data 
Page 
Data 
Page 
Full Pages 
Data 
Page Pages with 
Free Space 
 (heap file name, header page id) stored in a known place. 
 Two doubly linked lists, for full pages & pages with space. 
 Each page contains 2 `pointers’ plus data. 
 Upon insertion, scan the list of pages with space, or ask 
disk space manager to allocate a new page
26 
Heap File Using a Page Directory 
Data 
Page 1 
Data 
Page 2 
Data 
Page N 
Header 
Page 
DIRECTORY 
 A directory entry per page; it can include the 
number of free bytes on the page. 
 The directory is a collection of pages; linked 
list implementation is just one alternative. 
 Much smaller than linked list of all HF pages!
27 
Page Format 
 How to store a collection of records on a page? 
 Consider a page as a collection of slots, one for 
each record. 
 A record is identified by rid = <page id, slot #> 
 Record ids (rids) are used in indexes 
(Alternatives 2 and 3).
28 
Page Format: Fixed Length Records 
Slot 1 
Slot 2 
Slot N 
. . . 
PACKED 
N 
Free 
Space 
number 
of records 
. . . 
1 1 
. . . 0 1M 
M ... 3 2 1 
Slot 1 
Slot 2 
Slot N 
Slot M 
UNPACKED, BITMAP 
number 
of slots 
 Moving records for free space management changes 
rid! May not be acceptable.
20 16 24 N Pointer 
to start 
of free 
space 
29 
Page Format: Variable Length Records 
Page i 
Rid = (i,N) 
Rid = (i,2) 
Rid = (i,1) 
N . . . 2 1 # slots 
SLOT DIRECTORY 
Rid of record is <page id, slot #> 
 Can move records on page without changing rid; so, attractive 
for fixed-length records too. (level of indirection)
30 
Record Format: Fixed Length 
F1 F2 F3 F4 
L1 L2 L3 L4 
Base address (B) 
Address = B+L1+L2 
 Information of a record type e.g., the number of fields 
and field types is stored in the system catalog. 
 Fixed length record: (1) the number of fields is fixed, 
(2) each field has a fixed length. 
 Store fields consecutively in a record. 
 Finding i’th field does not require scan of record.
31 
Record Format: Variable Length 
 Variable length record: (1) number of fields is fixed, (2) 
some fields are variable length 
 Two alternatives: 
F1 F2 F3 F4 
$ $ $ $ 
Scan 
Fields Delimited 
by Special Symbols 
F1 F2 F3 F4 
S1 S2 S3 S4 E4 Array of Field 
Offsets 
 Second offers direct access to i’th field, efficient storage 
of NULLs ; small directory overhead.

More Related Content

What's hot

Record storage and primary file organization
Record storage and primary file organizationRecord storage and primary file organization
Record storage and primary file organizationJafar Nesargi
 
Structure of the page table
Structure of the page tableStructure of the page table
Structure of the page tableduvvuru madhuri
 
Chapter 11 - File System Implementation
Chapter 11 - File System ImplementationChapter 11 - File System Implementation
Chapter 11 - File System ImplementationWayne Jones Jnr
 
File System and File allocation tables
File System and File allocation tablesFile System and File allocation tables
File System and File allocation tablesshashikant pabari
 
File organization and indexing
File organization and indexingFile organization and indexing
File organization and indexingraveena sharma
 
Storage devices
Storage devicesStorage devices
Storage devicesTinku12345
 
Implementation of page table
Implementation of page tableImplementation of page table
Implementation of page tableguestff64339
 
Mass Storage Structure
Mass Storage StructureMass Storage Structure
Mass Storage StructureVimalanathan D
 
Csc4320 chapter 8 2
Csc4320 chapter 8 2Csc4320 chapter 8 2
Csc4320 chapter 8 2bshikhar13
 
All about Storage - Series 2 Defining Data
All about Storage - Series 2 Defining DataAll about Storage - Series 2 Defining Data
All about Storage - Series 2 Defining DataDAGEOP LTD
 
11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMSkoolkampus
 
Data Consistency Enhancement in Writeback mode of Journaling using Backpointers
Data Consistency Enhancement in Writeback mode of Journaling using BackpointersData Consistency Enhancement in Writeback mode of Journaling using Backpointers
Data Consistency Enhancement in Writeback mode of Journaling using BackpointersRohan Waghere
 
File implementation
File implementationFile implementation
File implementationMohd Arif
 
Inno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structureInno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structurezhaolinjnu
 
Lecture #1 Introduction
Lecture #1 IntroductionLecture #1 Introduction
Lecture #1 IntroductionRico
 
Operating System- Multilevel Paging, Inverted Page Table
Operating System- Multilevel Paging, Inverted Page TableOperating System- Multilevel Paging, Inverted Page Table
Operating System- Multilevel Paging, Inverted Page TableZishan Mohsin
 

What's hot (20)

Record storage and primary file organization
Record storage and primary file organizationRecord storage and primary file organization
Record storage and primary file organization
 
Structure of the page table
Structure of the page tableStructure of the page table
Structure of the page table
 
File organisation
File organisationFile organisation
File organisation
 
Chapter 11 - File System Implementation
Chapter 11 - File System ImplementationChapter 11 - File System Implementation
Chapter 11 - File System Implementation
 
File System and File allocation tables
File System and File allocation tablesFile System and File allocation tables
File System and File allocation tables
 
File organization and indexing
File organization and indexingFile organization and indexing
File organization and indexing
 
Storage devices
Storage devicesStorage devices
Storage devices
 
Implementation of page table
Implementation of page tableImplementation of page table
Implementation of page table
 
Mass Storage Structure
Mass Storage StructureMass Storage Structure
Mass Storage Structure
 
DB2 TABLESPACES
DB2 TABLESPACESDB2 TABLESPACES
DB2 TABLESPACES
 
Csc4320 chapter 8 2
Csc4320 chapter 8 2Csc4320 chapter 8 2
Csc4320 chapter 8 2
 
All about Storage - Series 2 Defining Data
All about Storage - Series 2 Defining DataAll about Storage - Series 2 Defining Data
All about Storage - Series 2 Defining Data
 
Unix
UnixUnix
Unix
 
11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS
 
Data Consistency Enhancement in Writeback mode of Journaling using Backpointers
Data Consistency Enhancement in Writeback mode of Journaling using BackpointersData Consistency Enhancement in Writeback mode of Journaling using Backpointers
Data Consistency Enhancement in Writeback mode of Journaling using Backpointers
 
File implementation
File implementationFile implementation
File implementation
 
File organisation
File organisationFile organisation
File organisation
 
Inno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structureInno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structure
 
Lecture #1 Introduction
Lecture #1 IntroductionLecture #1 Introduction
Lecture #1 Introduction
 
Operating System- Multilevel Paging, Inverted Page Table
Operating System- Multilevel Paging, Inverted Page TableOperating System- Multilevel Paging, Inverted Page Table
Operating System- Multilevel Paging, Inverted Page Table
 

Viewers also liked

06 file processing
06 file processing06 file processing
06 file processingIssay Meii
 
Tries - Tree Based Structures for Strings
Tries - Tree Based Structures for StringsTries - Tree Based Structures for Strings
Tries - Tree Based Structures for StringsAmrinder Arora
 
Fabrication of storage buffer vessel
Fabrication of storage buffer vesselFabrication of storage buffer vessel
Fabrication of storage buffer vesselNishant Rao Boddeda
 
5. spooling and buffering
5. spooling and buffering 5. spooling and buffering
5. spooling and buffering myrajendra
 
Fundamental File Processing Operations
Fundamental File Processing OperationsFundamental File Processing Operations
Fundamental File Processing OperationsRico
 
Download presentation
Download presentationDownload presentation
Download presentationwebhostingguy
 
Automated storage and retrieval system
Automated storage and retrieval systemAutomated storage and retrieval system
Automated storage and retrieval systemPrasanna3804
 

Viewers also liked (10)

06 file processing
06 file processing06 file processing
06 file processing
 
Tries - Tree Based Structures for Strings
Tries - Tree Based Structures for StringsTries - Tree Based Structures for Strings
Tries - Tree Based Structures for Strings
 
Digital Search Tree
Digital Search TreeDigital Search Tree
Digital Search Tree
 
Trie Data Structure
Trie Data StructureTrie Data Structure
Trie Data Structure
 
File structures
File structuresFile structures
File structures
 
Fabrication of storage buffer vessel
Fabrication of storage buffer vesselFabrication of storage buffer vessel
Fabrication of storage buffer vessel
 
5. spooling and buffering
5. spooling and buffering 5. spooling and buffering
5. spooling and buffering
 
Fundamental File Processing Operations
Fundamental File Processing OperationsFundamental File Processing Operations
Fundamental File Processing Operations
 
Download presentation
Download presentationDownload presentation
Download presentation
 
Automated storage and retrieval system
Automated storage and retrieval systemAutomated storage and retrieval system
Automated storage and retrieval system
 

Similar to Lecture storage-buffer

CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementJ Singh
 
Ch14 OS
Ch14 OSCh14 OS
Ch14 OSC.U
 
Chapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationChapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationJafar Nesargi
 
Virtual memory translation.pptx
Virtual memory translation.pptxVirtual memory translation.pptx
Virtual memory translation.pptxRAJESH S
 
15 bufferand records
15 bufferand records15 bufferand records
15 bufferand recordsashish61_scs
 
Chapter 12 - Mass Storage Systems
Chapter 12 - Mass Storage SystemsChapter 12 - Mass Storage Systems
Chapter 12 - Mass Storage SystemsWayne Jones Jnr
 
Aerospike: Key Value Data Access
Aerospike: Key Value Data AccessAerospike: Key Value Data Access
Aerospike: Key Value Data AccessAerospike, Inc.
 
Memory map
Memory mapMemory map
Memory mapaviban
 
presentations
presentationspresentations
presentationsMISY
 
Ie Storage, Multimedia And File Organization
Ie   Storage, Multimedia And File OrganizationIe   Storage, Multimedia And File Organization
Ie Storage, Multimedia And File OrganizationMISY
 
fdocuments.in_aerospike-key-value-data-access.ppt
fdocuments.in_aerospike-key-value-data-access.pptfdocuments.in_aerospike-key-value-data-access.ppt
fdocuments.in_aerospike-key-value-data-access.pptyashsharma863914
 
How Data Instant Replay and Data Progression Work Together
How Data Instant Replay and Data Progression Work TogetherHow Data Instant Replay and Data Progression Work Together
How Data Instant Replay and Data Progression Work TogetherCompellent Technologies
 
Managing Memory & Locks - Series 1 Memory Management
Managing  Memory & Locks - Series 1 Memory ManagementManaging  Memory & Locks - Series 1 Memory Management
Managing Memory & Locks - Series 1 Memory ManagementDAGEOP LTD
 

Similar to Lecture storage-buffer (20)

CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage Management
 
Index file
Index fileIndex file
Index file
 
OSCh14
OSCh14OSCh14
OSCh14
 
OS_Ch14
OS_Ch14OS_Ch14
OS_Ch14
 
Ch14 OS
Ch14 OSCh14 OS
Ch14 OS
 
Chapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationChapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organization
 
Virtual memory translation.pptx
Virtual memory translation.pptxVirtual memory translation.pptx
Virtual memory translation.pptx
 
15 bufferand records
15 bufferand records15 bufferand records
15 bufferand records
 
Chapter 12 - Mass Storage Systems
Chapter 12 - Mass Storage SystemsChapter 12 - Mass Storage Systems
Chapter 12 - Mass Storage Systems
 
Aerospike: Key Value Data Access
Aerospike: Key Value Data AccessAerospike: Key Value Data Access
Aerospike: Key Value Data Access
 
Memory map
Memory mapMemory map
Memory map
 
UNIT III.pptx
UNIT III.pptxUNIT III.pptx
UNIT III.pptx
 
Mass storage systemsos
Mass storage systemsosMass storage systemsos
Mass storage systemsos
 
presentations
presentationspresentations
presentations
 
Ie Storage, Multimedia And File Organization
Ie   Storage, Multimedia And File OrganizationIe   Storage, Multimedia And File Organization
Ie Storage, Multimedia And File Organization
 
Nachos 2
Nachos 2Nachos 2
Nachos 2
 
Nachos 2
Nachos 2Nachos 2
Nachos 2
 
fdocuments.in_aerospike-key-value-data-access.ppt
fdocuments.in_aerospike-key-value-data-access.pptfdocuments.in_aerospike-key-value-data-access.ppt
fdocuments.in_aerospike-key-value-data-access.ppt
 
How Data Instant Replay and Data Progression Work Together
How Data Instant Replay and Data Progression Work TogetherHow Data Instant Replay and Data Progression Work Together
How Data Instant Replay and Data Progression Work Together
 
Managing Memory & Locks - Series 1 Memory Management
Managing  Memory & Locks - Series 1 Memory ManagementManaging  Memory & Locks - Series 1 Memory Management
Managing Memory & Locks - Series 1 Memory Management
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Lecture storage-buffer

  • 1. 1 Storage in Database Systems CMPSCI 445 Fall 2010
  • 2. Storage Topics  Architecture and Overview  Disks  Buffer management  Files of records 2
  • 3. 3 DBMS Architecture Query Parser Query Rewriter Query Optimizer Query Executor File & Access Methods Buffer Manager Lock Manager Log Manager Disk Space Manager
  • 4. 4 Data on External Storage  Disks: Can retrieve random page at fixed cost  But reading several consecutive pages is much cheaper than reading them in random order  Tapes: Can only read pages in sequence  Cheaper than disks; used for archival storage  Page: Unit of information read from or written to disk  Size of page: DBMS parameter, 4KB, 8KB  Disk space manager:  Abstraction: a collection of pages.  Allocate/de-allocate a page.  Read/write a page.  Page I/O:  Pages read from disk and pages written to disk  Dominant cost of database operations
  • 5. 5 Buffer Management  Architecture:  Data is read into memory for processing  Data is written to disk for persistent storage  Buffer manager stages pages between external storage and main memory buffer pool.  Access method layer makes calls to the buffer manager.
  • 6. 6 Access Methods  Access methods: routines to manage various disk-based data structures.  Files of records  Various kinds of indexes  File of records:  Important abstraction of external storage in a DBMS!  Record id (rid) is sufficient to physically locate a record  Indexes:  Auxiliary data structures  Given values in index search key fields, find the record ids of records with those values
  • 7. 7 File organizations & access methods Many alternatives exist, each ideal for some situations, and not so good in others:  Heap (unordered) files: Suitable when typical access is a file scan retrieving all records.  Sorted Files: Best if records must be retrieved in some order, or only a `range’ of records is needed.  Indexes: Data structures to organize records via trees or hashing. • Like sorted files, they speed up searches for a subset of records, based on values in certain (“search key”) fields • Updates are much faster than in sorted files.
  • 9. 9 Disks and DBMS Design  DBMS stores information on disks.  This has major implications for DBMS design!  READ: transfer data from disk to main memory (RAM) for data processing.  WRITE: transfer data from RAM to disk for persistent storage.  Both are high-cost operations, relative to in-memory operations, so must be planned carefully!
  • 10. 10 Why Not Store Everything in Main Memory?  Main memory is volatile. We want data to be saved between runs. (Obviously!)  Costs too much. $100 will buy you either 4GB of RAM or 1TB of disk today.  32-bit addressing limitation.  232 bytes can be directly addressed in memory.  Number of objects cannot exceed this number.
  • 11. 11 Basics of Disks  Unit of storage and retrieval: disk block or page.  A disk block/page is a contiguous sequence of bytes.  Size of a DBMS parameter, 4KB or 8KB.  Disks support direct access to a page.  Unlike RAM, time to retrieve a page varies!  It depends upon the location on disk.  Therefore, relative placement of pages on disk has major impact on DBMS performance!
  • 12. 12 Components of a Disk Track Platters  Platters spin (say, 7200rpm). Spindle  Arm assembly is moved in or out to position a head on a desired track. Disk head Arm movement Arm assembly  Only one head reads/ writes at any one time.  Tracks under heads make a cylinder (imaginary!).  Each track is divided into sectors (whose size is fixed).  Block size is a multiple of sector size. Sector
  • 13. 13 Accessing a Disk Page  Time to access (read/write) a disk block:  seek time (moving arms to position disk head on track)  rotational delay (waiting for block to rotate under head)  transfer time (actually moving data to/from disk surface)  Seek time and rotational delay dominate.  Seek time varies from about 2 to 30 msec  Rotational delay varies from 0 to 11 msec  Transfer rate between 25 to 100 MB/s  Key to lower I/O cost: reduce seek/rotation delays!
  • 14. 14 Arranging Pages on Disk  `Next’ block concept:  blocks on same track, followed by  blocks on same cylinder, followed by  blocks on adjacent cylinder  Blocks in a file should be arranged sequentially on disk (by `next’), to minimize seek and rotational delay.  For a sequential scan, pre-fetching several pages at a time is a big win!
  • 16. 16 Buffer Management in a DBMS Page Requests from Higher Levels BUFFER POOL DB disk page free frame MAIN MEMORY DISK choice of frame dictated by replacement policy  Data must be in RAM for DBMS to operate on it!  Table of <frame#, pageid> pairs is maintained.
  • 17. 17 More on Buffer Management  Requestor of page must unpin it, and indicate whether page has been modified:  dirty bit is used for this.  Page in pool may be requested many times,  a pin count is used. A page is a candidate for replacement iff pin count = 0.  CC & recovery may entail additional I/O when a frame is chosen for replacement. (Write-Ahead Log protocol; more later.)
  • 18. 18 When a Page is Requested ...  If requested page is not in pool:  Choose a frame for replacement  If frame is dirty, write it to disk  Read requested page into chosen frame  Pin the page and return its address.  If requests can be predicted (e.g., sequential scans) pages can be pre-fetched several pages at a time!
  • 19. 19 Buffer Replacement Policy  Frame is chosen for replacement by a replacement policy:  Least-recently-used (LRU), Clock, MRU etc.  Policy can have big impact on # of I/O’s; depends on the access pattern.  Sequential flooding: Nasty situation caused by LRU + repeated sequential scans.  # buffer frames < # pages in file means each page request causes an I/O. MRU much better in this situation (but not in all situations, of course).
  • 20. 20 DBMS vs. OS File System OS does disk space & buffer mgmt: why not let OS manage these tasks?  Differences in OS support: portability issues  Some limitations, e.g., files can’t span disks.  Buffer management in DBMS requires ability to:  pin a page in buffer pool, force a page to disk (important for implementing CC & recovery),  adjust replacement policy, and pre-fetch pages based on access patterns in typical DB operations.
  • 22. 22 Files  Access method layer offers an abstraction of data on disk: a file of records residing on multiple pages  A number of fields are organized in a record  A collection of records are organized in a page  A collection of pages are organized in a file
  • 23. 23 Files of Records  Page or block is OK when doing I/O, but higher levels of DBMS operate on records and files of records.  FILE: A collection of pages, each containing a collection of records. Must support:  insert/delete/modify record  read a particular record (specified using record id)  scan all records (possibly with some conditions on the records to be retrieved)
  • 24. 24 Unordered (Heap) Files  Simplest file structure contains records in no particular order.  As file grows and shrinks, disk pages are allocated and de-allocated.  To support record level operations, we must:  keep track of the pages in a file  keep track of free space on pages  keep track of the records on a page  There are many alternatives for keeping track of this.
  • 25. 25 Heap File Implemented as a List Header Page Data Page Data Page Data Page Data Page Data Page Full Pages Data Page Pages with Free Space  (heap file name, header page id) stored in a known place.  Two doubly linked lists, for full pages & pages with space.  Each page contains 2 `pointers’ plus data.  Upon insertion, scan the list of pages with space, or ask disk space manager to allocate a new page
  • 26. 26 Heap File Using a Page Directory Data Page 1 Data Page 2 Data Page N Header Page DIRECTORY  A directory entry per page; it can include the number of free bytes on the page.  The directory is a collection of pages; linked list implementation is just one alternative.  Much smaller than linked list of all HF pages!
  • 27. 27 Page Format  How to store a collection of records on a page?  Consider a page as a collection of slots, one for each record.  A record is identified by rid = <page id, slot #>  Record ids (rids) are used in indexes (Alternatives 2 and 3).
  • 28. 28 Page Format: Fixed Length Records Slot 1 Slot 2 Slot N . . . PACKED N Free Space number of records . . . 1 1 . . . 0 1M M ... 3 2 1 Slot 1 Slot 2 Slot N Slot M UNPACKED, BITMAP number of slots  Moving records for free space management changes rid! May not be acceptable.
  • 29. 20 16 24 N Pointer to start of free space 29 Page Format: Variable Length Records Page i Rid = (i,N) Rid = (i,2) Rid = (i,1) N . . . 2 1 # slots SLOT DIRECTORY Rid of record is <page id, slot #>  Can move records on page without changing rid; so, attractive for fixed-length records too. (level of indirection)
  • 30. 30 Record Format: Fixed Length F1 F2 F3 F4 L1 L2 L3 L4 Base address (B) Address = B+L1+L2  Information of a record type e.g., the number of fields and field types is stored in the system catalog.  Fixed length record: (1) the number of fields is fixed, (2) each field has a fixed length.  Store fields consecutively in a record.  Finding i’th field does not require scan of record.
  • 31. 31 Record Format: Variable Length  Variable length record: (1) number of fields is fixed, (2) some fields are variable length  Two alternatives: F1 F2 F3 F4 $ $ $ $ Scan Fields Delimited by Special Symbols F1 F2 F3 F4 S1 S2 S3 S4 E4 Array of Field Offsets  Second offers direct access to i’th field, efficient storage of NULLs ; small directory overhead.