SlideShare a Scribd company logo
1 of 16
Download to read offline
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
External Sorting
Chapters 13: 13.1—13.5
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
Why Sort?
 A classic problem in computer science (See Knuth, v.3)!
 Data requested in sorted order
 e.g., find students in increasing gpa order
 Sorting is first step in bulk loading B+ tree index.
 Sorting useful for eliminating duplicate copies in a
collection of records (Why?)
 Sort-merge join algorithm involves sorting.
 Problem: sort 1Gb of data with 1Mb of RAM.
 why not virtual memory?
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3
2-Way Merge Sort: Requires 3 Buffers
 Pass 1: Read a page, sort it (in memory), write it.
 only one buffer page is used
 Pass 2, 3, …, etc.:
 three buffer pages used.
Main memory buffers
INPUT 1
INPUT 2
OUTPUT
Disk
Disk
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4
2-Way Merge Sort: Algorithm
proc two-way_extsort(file)
// Given a file on disk, sort it using 3 buffer pages
//Pass 0: produce 1-page runs
Read each page of file into memory, sort it, and write it out.
//Merge pairs of runs to produce longer runs until 1 run is left
While # of runs at end of previous pass > 1 do:
//Process passes i=1,2,…
While there are runs to be merged from previous pass do:
Pick next 2 runs from previous pass.
Reach read each run into an input buffer, 1 page at a time.
Merge the runs and write result to the output buffer by
forcing output buffer to disk one page at a time.
endproc
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5
Two-Way External Merge Sort
 Each pass we read + write
each page in file => 2N I/O’s
per pass.
 N pages in the file => the
number of passes
 So total cost is:
 Idea: Divide and conquer:
sort subfiles and merge
 Input file contains 7 pages;
dark pages shows what
would happen with 8 pages.
 
 
log2 1
N
 
 
2 1
2
N N
log 
Input file
1-page runs
2-page runs
4-page runs
8-page runs
PASS 0
PASS 1
PASS 2
PASS 3
9
3,4 6,2 9,4 8,7 5,6 3,1 2
3,4 5,6
2,6 4,9 7,8 1,3 2
2,3
4,6
4,7
8,9
1,3
5,6 2
2,3
4,4
6,7
8,9
1,2
3,5
6
1,2
2,3
3,4
4,5
6,6
7,8
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6
General External Merge Sort
 To sort a file with N pages using B buffer pages:
 Pass 0: use B buffer pages. Produce sorted runs of B
pages each.
 Pass 2, …, etc.: merge B-1 runs.
 
N B
/
B Main memory buffers
INPUT 1
INPUT B-1
OUTPUT
Disk
Disk
INPUT 2
. . . . . .
. . .
* More than 3 buffer pages. How can we utilize them?
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7
General External Merge Sort: Algorithm
proc extsort(file)
// Given a file on disk, sort it using B buffer pages
//Pass 0: produce B-pages runs
Read B pages of file into memory, sort them, and write them out.
//Merge B-1 runs to produce longer runs until 1 run is left
While # of runs at end of previous pass > 1 do:
//Process passes i=1,2,…
While there are runs to be merged from previous pass do:
Pick next B-1 runs from previous pass.
Reach read each run into an input buffer, 1 page at a time.
Merge the runs and write result to the output buffer by
forcing output buffer to disk one page at a time.
endproc
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8
Cost of External Merge Sort
 Number of passes:
 Cost = 2N * (# of passes)
 E.g., with 5 buffer pages, to sort 108 page file:
 Pass 0: = 22 sorted runs of 5 pages each
(last run is only 3 pages)
 Pass 1: = 6 sorted runs of 20 pages each
(last run is only 8 pages)
 Pass 2: 2 sorted runs, 80 pages and 28 pages
 Pass 3: Sorted file of 108 pages
 
 
1 1
 
log /
B N B
 
108 5
/
 
22 4
/
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9
Number of Passes of External Sort
N B=3 B=5 B=9 B=17 B=129 B=257
100 7 4 3 2 1 1
1,000 10 5 4 3 2 2
10,000 13 7 5 4 2 2
100,000 17 9 6 5 3 3
1,000,000 20 10 7 5 3 3
10,000,000 23 12 8 6 4 3
100,000,000 26 14 9 7 4 4
1,000,000,000 30 15 10 8 5 4
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10
I/O for External Merge Sort
 … longer runs often means fewer passes!
 Actually, do I/O a page at a time
 In fact, read a block of pages sequentially!
 Suggests we should make each buffer
(input/output) be a block of pages.
 But this will reduce fan-out during merge passes!
 In practice, most files still sorted in 2-3 passes.
 Typical DBMSs sort 1M records of size 100 bytes
in 15 minutes
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11
Number of Passes of Optimized Sort
N B=1,000 B=5,000 B=10,000
100 1 1 1
1,000 1 1 1
10,000 2 2 1
100,000 3 2 2
1,000,000 3 2 2
10,000,000 4 3 3
100,000,000 5 3 3
1,000,000,000 5 4 3
* Block size = 32, initial pass produces runs of size 2B.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12
Double Buffering
 To reduce wait time for I/O request to
complete, can prefetch into `shadow block’.
 Potentially, more passes; in practice, most files still
sorted in 2-3 passes.
OUTPUT
OUTPUT'
Disk Disk
INPUT 1
INPUT k
INPUT 2
INPUT 1'
INPUT 2'
INPUT k'
block size
b
B main memory buffers, k-way merge
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13
Using B+ Trees for Sorting
 Scenario: Table to be sorted has B+ tree index on
sorting column(s).
 Idea: Can retrieve records in order by traversing
leaf pages.
 Is this a good idea?
 Cases to consider:
 B+ tree is clustered Good idea!
 B+ tree is not clustered Could be a very bad idea!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14
Clustered B+ Tree Used for Sorting
 Cost: root to the left-
most leaf, then retrieve
all leaf pages
(Alternative 1)
 If Alternative 2 is used?
Additional cost of
retrieving data records:
each page fetched just
once.
* Always better than external sorting!
(Directs search)
Data Records
Index
Data Entries
("Sequence set")
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15
Unclustered B+ Tree Used for Sorting
 Alternative (2) for data entries; each data
entry contains rid of a data record. In general,
one I/O per data record!
(Directs search)
Data Records
Index
Data Entries
("Sequence set")
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16
Summary
 External sorting is important; DBMS may dedicate
part of buffer pool for sorting!
 External merge sort minimizes disk I/O cost:
 Pass 0: Produces sorted runs of size B (# buffer pages).
Later passes: merge runs.
 # of runs merged at a time depends on B, and block size.
 Larger block size => smaller # runs merged.
 Clustered B+ tree is good for sorting; unclustered
tree is usually very bad.

More Related Content

Similar to ch13_extsort.ppt

MongoDB Memory Management Demystified
MongoDB Memory Management DemystifiedMongoDB Memory Management Demystified
MongoDB Memory Management DemystifiedMongoDB
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce AlgorithmsAmund Tveit
 
Database management system chapter13
Database management system chapter13Database management system chapter13
Database management system chapter13Pranab Dasgupta
 
Parallel Database description in database management
Parallel Database description in database managementParallel Database description in database management
Parallel Database description in database managementchandugoswami
 
ch22a_ParallelDBs how parallel Datab.ppt
ch22a_ParallelDBs how parallel Datab.pptch22a_ParallelDBs how parallel Datab.ppt
ch22a_ParallelDBs how parallel Datab.pptRahulBhole12
 
Write intensive workloads and lsm trees
Write intensive workloads and lsm treesWrite intensive workloads and lsm trees
Write intensive workloads and lsm treesTilak Patidar
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementJ Singh
 
Samsung DeepSort
Samsung DeepSortSamsung DeepSort
Samsung DeepSortRyo Jin
 
Memory management in Linux
Memory management in LinuxMemory management in Linux
Memory management in LinuxRaghu Udiyar
 
data stage-material
data stage-materialdata stage-material
data stage-materialRajesh Kv
 
Unix Memory Management - Operating Systems
Unix Memory Management - Operating SystemsUnix Memory Management - Operating Systems
Unix Memory Management - Operating SystemsDrishti Bhalla
 
Chapter 04
Chapter 04Chapter 04
Chapter 04 Google
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
Int 2 computer structure 2010
Int 2 computer structure 2010Int 2 computer structure 2010
Int 2 computer structure 2010iarthur
 
Input and Output Devices and Systems
Input and Output Devices and SystemsInput and Output Devices and Systems
Input and Output Devices and SystemsNajma Alam
 

Similar to ch13_extsort.ppt (20)

MongoDB Memory Management Demystified
MongoDB Memory Management DemystifiedMongoDB Memory Management Demystified
MongoDB Memory Management Demystified
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Database management system chapter13
Database management system chapter13Database management system chapter13
Database management system chapter13
 
Parallel Database description in database management
Parallel Database description in database managementParallel Database description in database management
Parallel Database description in database management
 
ch22a_ParallelDBs how parallel Datab.ppt
ch22a_ParallelDBs how parallel Datab.pptch22a_ParallelDBs how parallel Datab.ppt
ch22a_ParallelDBs how parallel Datab.ppt
 
Write intensive workloads and lsm trees
Write intensive workloads and lsm treesWrite intensive workloads and lsm trees
Write intensive workloads and lsm trees
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage Management
 
Samsung DeepSort
Samsung DeepSortSamsung DeepSort
Samsung DeepSort
 
Spark Meetup
Spark MeetupSpark Meetup
Spark Meetup
 
Memory management in Linux
Memory management in LinuxMemory management in Linux
Memory management in Linux
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
virtual memory
virtual memoryvirtual memory
virtual memory
 
data stage-material
data stage-materialdata stage-material
data stage-material
 
MapReduce
MapReduceMapReduce
MapReduce
 
Data corruption
Data corruptionData corruption
Data corruption
 
Unix Memory Management - Operating Systems
Unix Memory Management - Operating SystemsUnix Memory Management - Operating Systems
Unix Memory Management - Operating Systems
 
Chapter 04
Chapter 04Chapter 04
Chapter 04
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Int 2 computer structure 2010
Int 2 computer structure 2010Int 2 computer structure 2010
Int 2 computer structure 2010
 
Input and Output Devices and Systems
Input and Output Devices and SystemsInput and Output Devices and Systems
Input and Output Devices and Systems
 

More from RobinRohit2

devops-complete-notes-2.pdf
devops-complete-notes-2.pdfdevops-complete-notes-2.pdf
devops-complete-notes-2.pdfRobinRohit2
 
Different Components of Computer
Different Components of ComputerDifferent Components of Computer
Different Components of ComputerRobinRohit2
 
Data Structures Notes
Data Structures NotesData Structures Notes
Data Structures NotesRobinRohit2
 
08_Subnetting_IP_Networks.pdf
08_Subnetting_IP_Networks.pdf08_Subnetting_IP_Networks.pdf
08_Subnetting_IP_Networks.pdfRobinRohit2
 
Floating Roof Operation.pptx
Floating Roof Operation.pptxFloating Roof Operation.pptx
Floating Roof Operation.pptxRobinRohit2
 
031VCRS19-les-01_oJ80LT2.pptx
031VCRS19-les-01_oJ80LT2.pptx031VCRS19-les-01_oJ80LT2.pptx
031VCRS19-les-01_oJ80LT2.pptxRobinRohit2
 
Intro Ch 01B.ppt
Intro Ch 01B.pptIntro Ch 01B.ppt
Intro Ch 01B.pptRobinRohit2
 
Computer Hardware.ppt
Computer Hardware.pptComputer Hardware.ppt
Computer Hardware.pptRobinRohit2
 
SRWE_Module_14.pptx
SRWE_Module_14.pptxSRWE_Module_14.pptx
SRWE_Module_14.pptxRobinRohit2
 
SRWE_Module_16.pptx
SRWE_Module_16.pptxSRWE_Module_16.pptx
SRWE_Module_16.pptxRobinRohit2
 
SRWE_Module_16.pptx
SRWE_Module_16.pptxSRWE_Module_16.pptx
SRWE_Module_16.pptxRobinRohit2
 

More from RobinRohit2 (14)

devops-complete-notes-2.pdf
devops-complete-notes-2.pdfdevops-complete-notes-2.pdf
devops-complete-notes-2.pdf
 
Different Components of Computer
Different Components of ComputerDifferent Components of Computer
Different Components of Computer
 
Data Structures Notes
Data Structures NotesData Structures Notes
Data Structures Notes
 
DATA STRUCTURE
DATA STRUCTUREDATA STRUCTURE
DATA STRUCTURE
 
Ch06.ppt
Ch06.pptCh06.ppt
Ch06.ppt
 
08_Subnetting_IP_Networks.pdf
08_Subnetting_IP_Networks.pdf08_Subnetting_IP_Networks.pdf
08_Subnetting_IP_Networks.pdf
 
AES (2).ppt
AES (2).pptAES (2).ppt
AES (2).ppt
 
Floating Roof Operation.pptx
Floating Roof Operation.pptxFloating Roof Operation.pptx
Floating Roof Operation.pptx
 
031VCRS19-les-01_oJ80LT2.pptx
031VCRS19-les-01_oJ80LT2.pptx031VCRS19-les-01_oJ80LT2.pptx
031VCRS19-les-01_oJ80LT2.pptx
 
Intro Ch 01B.ppt
Intro Ch 01B.pptIntro Ch 01B.ppt
Intro Ch 01B.ppt
 
Computer Hardware.ppt
Computer Hardware.pptComputer Hardware.ppt
Computer Hardware.ppt
 
SRWE_Module_14.pptx
SRWE_Module_14.pptxSRWE_Module_14.pptx
SRWE_Module_14.pptx
 
SRWE_Module_16.pptx
SRWE_Module_16.pptxSRWE_Module_16.pptx
SRWE_Module_16.pptx
 
SRWE_Module_16.pptx
SRWE_Module_16.pptxSRWE_Module_16.pptx
SRWE_Module_16.pptx
 

Recently uploaded

Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...arifengg7
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProRay Yuan Liu
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Communityprachaibot
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier Fernández Muñoz
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales
 
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...IJAEMSJORNAL
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
priority interrupt computer organization
priority interrupt computer organizationpriority interrupt computer organization
priority interrupt computer organizationchnrketan
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
AntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxAntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxLina Kadam
 
Indian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdfIndian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdfalokitpathak01
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTSneha Padhiar
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
ADM100 Running Book for sap basis domain study
ADM100 Running Book for sap basis domain studyADM100 Running Book for sap basis domain study
ADM100 Running Book for sap basis domain studydhruvamdhruvil123
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical trainingGladiatorsKasper
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 

Recently uploaded (20)

Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision Pro
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Community
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptx
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos
 
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
priority interrupt computer organization
priority interrupt computer organizationpriority interrupt computer organization
priority interrupt computer organization
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
AntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxAntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptx
 
Indian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdfIndian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdf
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
ADM100 Running Book for sap basis domain study
ADM100 Running Book for sap basis domain studyADM100 Running Book for sap basis domain study
ADM100 Running Book for sap basis domain study
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 

ch13_extsort.ppt

  • 1. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapters 13: 13.1—13.5
  • 2. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2 Why Sort?  A classic problem in computer science (See Knuth, v.3)!  Data requested in sorted order  e.g., find students in increasing gpa order  Sorting is first step in bulk loading B+ tree index.  Sorting useful for eliminating duplicate copies in a collection of records (Why?)  Sort-merge join algorithm involves sorting.  Problem: sort 1Gb of data with 1Mb of RAM.  why not virtual memory?
  • 3. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3 2-Way Merge Sort: Requires 3 Buffers  Pass 1: Read a page, sort it (in memory), write it.  only one buffer page is used  Pass 2, 3, …, etc.:  three buffer pages used. Main memory buffers INPUT 1 INPUT 2 OUTPUT Disk Disk
  • 4. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 2-Way Merge Sort: Algorithm proc two-way_extsort(file) // Given a file on disk, sort it using 3 buffer pages //Pass 0: produce 1-page runs Read each page of file into memory, sort it, and write it out. //Merge pairs of runs to produce longer runs until 1 run is left While # of runs at end of previous pass > 1 do: //Process passes i=1,2,… While there are runs to be merged from previous pass do: Pick next 2 runs from previous pass. Reach read each run into an input buffer, 1 page at a time. Merge the runs and write result to the output buffer by forcing output buffer to disk one page at a time. endproc
  • 5. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5 Two-Way External Merge Sort  Each pass we read + write each page in file => 2N I/O’s per pass.  N pages in the file => the number of passes  So total cost is:  Idea: Divide and conquer: sort subfiles and merge  Input file contains 7 pages; dark pages shows what would happen with 8 pages.     log2 1 N     2 1 2 N N log  Input file 1-page runs 2-page runs 4-page runs 8-page runs PASS 0 PASS 1 PASS 2 PASS 3 9 3,4 6,2 9,4 8,7 5,6 3,1 2 3,4 5,6 2,6 4,9 7,8 1,3 2 2,3 4,6 4,7 8,9 1,3 5,6 2 2,3 4,4 6,7 8,9 1,2 3,5 6 1,2 2,3 3,4 4,5 6,6 7,8
  • 6. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6 General External Merge Sort  To sort a file with N pages using B buffer pages:  Pass 0: use B buffer pages. Produce sorted runs of B pages each.  Pass 2, …, etc.: merge B-1 runs.   N B / B Main memory buffers INPUT 1 INPUT B-1 OUTPUT Disk Disk INPUT 2 . . . . . . . . . * More than 3 buffer pages. How can we utilize them?
  • 7. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7 General External Merge Sort: Algorithm proc extsort(file) // Given a file on disk, sort it using B buffer pages //Pass 0: produce B-pages runs Read B pages of file into memory, sort them, and write them out. //Merge B-1 runs to produce longer runs until 1 run is left While # of runs at end of previous pass > 1 do: //Process passes i=1,2,… While there are runs to be merged from previous pass do: Pick next B-1 runs from previous pass. Reach read each run into an input buffer, 1 page at a time. Merge the runs and write result to the output buffer by forcing output buffer to disk one page at a time. endproc
  • 8. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8 Cost of External Merge Sort  Number of passes:  Cost = 2N * (# of passes)  E.g., with 5 buffer pages, to sort 108 page file:  Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages)  Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages)  Pass 2: 2 sorted runs, 80 pages and 28 pages  Pass 3: Sorted file of 108 pages     1 1   log / B N B   108 5 /   22 4 /
  • 9. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9 Number of Passes of External Sort N B=3 B=5 B=9 B=17 B=129 B=257 100 7 4 3 2 1 1 1,000 10 5 4 3 2 2 10,000 13 7 5 4 2 2 100,000 17 9 6 5 3 3 1,000,000 20 10 7 5 3 3 10,000,000 23 12 8 6 4 3 100,000,000 26 14 9 7 4 4 1,000,000,000 30 15 10 8 5 4
  • 10. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10 I/O for External Merge Sort  … longer runs often means fewer passes!  Actually, do I/O a page at a time  In fact, read a block of pages sequentially!  Suggests we should make each buffer (input/output) be a block of pages.  But this will reduce fan-out during merge passes!  In practice, most files still sorted in 2-3 passes.  Typical DBMSs sort 1M records of size 100 bytes in 15 minutes
  • 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11 Number of Passes of Optimized Sort N B=1,000 B=5,000 B=10,000 100 1 1 1 1,000 1 1 1 10,000 2 2 1 100,000 3 2 2 1,000,000 3 2 2 10,000,000 4 3 3 100,000,000 5 3 3 1,000,000,000 5 4 3 * Block size = 32, initial pass produces runs of size 2B.
  • 12. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12 Double Buffering  To reduce wait time for I/O request to complete, can prefetch into `shadow block’.  Potentially, more passes; in practice, most files still sorted in 2-3 passes. OUTPUT OUTPUT' Disk Disk INPUT 1 INPUT k INPUT 2 INPUT 1' INPUT 2' INPUT k' block size b B main memory buffers, k-way merge
  • 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13 Using B+ Trees for Sorting  Scenario: Table to be sorted has B+ tree index on sorting column(s).  Idea: Can retrieve records in order by traversing leaf pages.  Is this a good idea?  Cases to consider:  B+ tree is clustered Good idea!  B+ tree is not clustered Could be a very bad idea!
  • 14. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14 Clustered B+ Tree Used for Sorting  Cost: root to the left- most leaf, then retrieve all leaf pages (Alternative 1)  If Alternative 2 is used? Additional cost of retrieving data records: each page fetched just once. * Always better than external sorting! (Directs search) Data Records Index Data Entries ("Sequence set")
  • 15. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15 Unclustered B+ Tree Used for Sorting  Alternative (2) for data entries; each data entry contains rid of a data record. In general, one I/O per data record! (Directs search) Data Records Index Data Entries ("Sequence set")
  • 16. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16 Summary  External sorting is important; DBMS may dedicate part of buffer pool for sorting!  External merge sort minimizes disk I/O cost:  Pass 0: Produces sorted runs of size B (# buffer pages). Later passes: merge runs.  # of runs merged at a time depends on B, and block size.  Larger block size => smaller # runs merged.  Clustered B+ tree is good for sorting; unclustered tree is usually very bad.