Projection Size Disk Reserves Sailors

•Download as PPTX, PDF•

0 likes•6 views

This document discusses the I/O costs of performing a DISTINCT operation on the Reserves relation using a two-phase hash-based approach. It analyzes the I/O costs for partitioning the data into buckets in the first phase and eliminating duplicates in the second phase under different buffer sizes. The total I/O cost is shown to be linear in the number of input and output pages.

Technology

Projection
)
(
,...,
, 2
1 R
m
attr
attr
attr

1

 Tuple Length
◦ Reserves
 40 bytes
◦ Sailors
 50 bytes
 # of Tuples
◦ Reserves
 100,000
◦ Sailors
 40,000
 Page Size
◦ 4k
 Size on Disk
◦ Reserves ???
◦ Sailors ???
3

 SELECT DISTINCT R.sid, R.bid
FROM Reserves R
)
(
,...,
, 2
1 R
m
attr
attr
attr

Reserves
bid
sid,

4

 M:= # of pages of R
 T:= M*F
 Scan and Project (in Temp)
◦ O(M)+O(T)
 Sort (Temp)
◦ O (TlogT)
 2-Way Sort
 Scan (Temp) and Eliminate Duplicates
◦ O(T)
h
TupleLengt
attr
of
Size
F
m
i
i /
)
(
_
1




5

 sid+bid = 10 bytes
 Buffer Pages = 20
 Scan and Project (in Temp)
◦ 1000+250 = 1250
 Sort (Temp)
◦ Two passes
 Pass 0 (⌈250/20⌉ = 13 runs)
 Pass 1 (13 Way Merges)
◦ 2(2*250) = 1000
 Scan (Temp) and Eliminate Duplicates
◦ 250
Total I/Os:
1250
1000
250
2500
6

 Buffer Pages = B
 Pass 0
◦ Project Out Unwanted Attributes
◦ Read B Pages Write run of B*F Pages
 Runs of 2B pages with aggressive implementation
 Subsequent Passes
◦ Eliminate Duplicates while Merging
h
TupleLengt
attr
of
Size
F
m
i
i /
)
(
_
1




7

◦ sid+bid = 10 bytes
◦ Buffer Pages = 20
 Pass 0
◦ 1000+250 = 1250
◦ 50 Reads of 20 Pages each
◦ 50 Writes of 5 Pages each
 Pass 1
◦ 250*2 = 500
◦ 19-Way Merges with Duplicates Elimination
◦ Two runs of 19*5 pages, One run of 60 pages
 Pass 2
◦ 250*1 = 250
◦ 3-Way Merge with Duplicates Elimination
Total I/Os:
1250
500
250
2000
8

 Using Aggressive Implementation
 Pass 0
◦ 1000+250 = 1250
◦ 6 runs of 40 pages each
◦ 1 run of 10 pages
 Pass 1
◦ 250*1 = 250
◦ 7-Way Merge with Duplicates Elimination
Total I/Os:
1250
250
1500
9

 B (buffer pages) is typically large
◦ f:= fudge factor
 2-Phase Process
◦ Phase 1
 Partitioning
 (After Projecting OUT Unwanted Attributes)
◦ Phase 2
 Duplicates Elimination
f
T
B .

10

 One input buffer page
 B −1 output buffer pages
 For each tuple
◦ Project out the unwanted attributes
◦ Apply a hash function h to the combination of all
remaining attributes
 h is chosen so that tuples are distributed
uniformly to one of B −1 partitions
11

12
At the end of the partitioning phase,
we have B −1 partitions

 X and Y are identical records only if
◦ hash(X) == hash(Y)
 For each Partition (made in phase 1)
◦ Insert each Tuple into an in-memory Hash table
 Using a different Hash function
 Why different ???
◦ Compare Colliding Tuples for Duplicates
◦ Discard Duplicates when Found
13

 Phase 1: Partitioning
◦ Read M Pages
◦ Write T Pages
 Phase 2: Duplicates Elimination
◦ Read T Pages
 Total I/Os
◦ M+2T
14

Similar to Projection Size Disk Reserves Sailors

Options and trade offs for parallelism and concurrency in Modern C++Satalia

Problemas resueltos de funciones lineales ccesa007Demetrio Ccesa Rayme

ASFWS 2012 - Hash-flooding DoS reloaded: attacks and defenses par Jean-Philip...Cyber Security Alliance

Generating and Analyzing Eventsztellman

Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...JAX London

Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...GeeksLab Odessa

Lecture.1Faiza Memon

Lecture 2 coal sping12Rabia Khalid

5th Semeste Electronics and Communication Engineering (June-2016) Question Pa...BGS Institute of Technology, Adichunchanagiri University (ACU)

Michal Malohlava presents: Open Source H2O and Scala Sri Ambati

"Metrics: Where and How", Vsevolod PolyakovYulia Shcherbachova

Scala Functional Patternsleague

Platoon Control of Nonholonomic Robots using Quintic Bezier SplinesKaustav Mondal

A Speculative Technique for Auto-Memoization Processor with MultithreadingMatsuo and Tsumura lab.

1.2 matlab numerical dataTANVIRAHMED611926

Kuliah teori dan analisis jaringan - linear programmingHarun Al-Rasyid Lubis

zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...Alex Pruden

Всеволод Поляков (DevOps Team Lead в Grammarly)Provectus

Metrics: where and howVsevolod Polyakov

bode_plot By DEVDevchandra Thakur

Similar to Projection Size Disk Reserves Sailors (20)

Options and trade offs for parallelism and concurrency in Modern C++

Problemas resueltos de funciones lineales ccesa007

ASFWS 2012 - Hash-flooding DoS reloaded: attacks and defenses par Jean-Philip...

Generating and Analyzing Events

Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...

Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...

Lecture.1

Lecture 2 coal sping12

5th Semeste Electronics and Communication Engineering (June-2016) Question Pa...

Michal Malohlava presents: Open Source H2O and Scala

"Metrics: Where and How", Vsevolod Polyakov

Scala Functional Patterns

Platoon Control of Nonholonomic Robots using Quintic Bezier Splines

A Speculative Technique for Auto-Memoization Processor with Multithreading

1.2 matlab numerical data

Kuliah teori dan analisis jaringan - linear programming

zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...

Всеволод Поляков (DevOps Team Lead в Grammarly)

Metrics: where and how

bode_plot By DEV

Recently uploaded

A Year of the Servo Reboot: Where Are We Now?Igalia

Scaling API-first – The story of a global engineering organizationRadu Cotescu

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

A Call to Action for Generative AI in 2024Results

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Slack Application Development 101 Slidespraypatel2

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Recently uploaded (20)

A Year of the Servo Reboot: Where Are We Now?

Scaling API-first – The story of a global engineering organization

2024: Domino Containers - The Next Step. News from the Domino Container commu...

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

What Are The Drone Anti-jamming Systems Technology?

The Codex of Business Writing Software for Real-World Solutions 2.pptx

CNv6 Instructor Chapter 6 Quality of Service

A Call to Action for Generative AI in 2024

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Slack Application Development 101 Slides

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

[2024]Digital Global Overview Report 2024 Meltwater.pdf

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Axa Assurance Maroc - Insurer Innovation Award 2024

GenCyber Cyber Security Day Presentation

Projection Size Disk Reserves Sailors

1. Projection ) ( ,..., , 2 1 R m attr attr attr  1

2. 2

3.  Tuple Length ◦ Reserves  40 bytes ◦ Sailors  50 bytes  # of Tuples ◦ Reserves  100,000 ◦ Sailors  40,000  Page Size ◦ 4k  Size on Disk ◦ Reserves ??? ◦ Sailors ??? 3

4.  SELECT DISTINCT R.sid, R.bid FROM Reserves R ) ( ,..., , 2 1 R m attr attr attr  Reserves bid sid,  4

5.  M:= # of pages of R  T:= M*F  Scan and Project (in Temp) ◦ O(M)+O(T)  Sort (Temp) ◦ O (TlogT)  2-Way Sort  Scan (Temp) and Eliminate Duplicates ◦ O(T) h TupleLengt attr of Size F m i i / ) ( _ 1     5

6.  sid+bid = 10 bytes  Buffer Pages = 20  Scan and Project (in Temp) ◦ 1000+250 = 1250  Sort (Temp) ◦ Two passes  Pass 0 (⌈250/20⌉ = 13 runs)  Pass 1 (13 Way Merges) ◦ 2(2*250) = 1000  Scan (Temp) and Eliminate Duplicates ◦ 250 Total I/Os: 1250 1000 250 2500 6

7.  Buffer Pages = B  Pass 0 ◦ Project Out Unwanted Attributes ◦ Read B Pages Write run of B*F Pages  Runs of 2B pages with aggressive implementation  Subsequent Passes ◦ Eliminate Duplicates while Merging h TupleLengt attr of Size F m i i / ) ( _ 1     7

8. ◦ sid+bid = 10 bytes ◦ Buffer Pages = 20  Pass 0 ◦ 1000+250 = 1250 ◦ 50 Reads of 20 Pages each ◦ 50 Writes of 5 Pages each  Pass 1 ◦ 250*2 = 500 ◦ 19-Way Merges with Duplicates Elimination ◦ Two runs of 19*5 pages, One run of 60 pages  Pass 2 ◦ 250*1 = 250 ◦ 3-Way Merge with Duplicates Elimination Total I/Os: 1250 500 250 2000 8

9.  Using Aggressive Implementation  Pass 0 ◦ 1000+250 = 1250 ◦ 6 runs of 40 pages each ◦ 1 run of 10 pages  Pass 1 ◦ 250*1 = 250 ◦ 7-Way Merge with Duplicates Elimination Total I/Os: 1250 250 1500 9

10.  B (buffer pages) is typically large ◦ f:= fudge factor  2-Phase Process ◦ Phase 1  Partitioning  (After Projecting OUT Unwanted Attributes) ◦ Phase 2  Duplicates Elimination f T B .  10

11.  One input buffer page  B −1 output buffer pages  For each tuple ◦ Project out the unwanted attributes ◦ Apply a hash function h to the combination of all remaining attributes  h is chosen so that tuples are distributed uniformly to one of B −1 partitions 11

12. 12 At the end of the partitioning phase, we have B −1 partitions

13.  X and Y are identical records only if ◦ hash(X) == hash(Y)  For each Partition (made in phase 1) ◦ Insert each Tuple into an in-memory Hash table  Using a different Hash function  Why different ??? ◦ Compare Colliding Tuples for Duplicates ◦ Discard Duplicates when Found 13

14.  Phase 1: Partitioning ◦ Read M Pages ◦ Write T Pages  Phase 2: Duplicates Elimination ◦ Read T Pages  Total I/Os ◦ M+2T 14

15.  Index-Only Scan 15

Projection Size Disk Reserves Sailors

Recommended

Recommended

More Related Content

Similar to Projection Size Disk Reserves Sailors

Similar to Projection Size Disk Reserves Sailors (20)

More from Robbia Rana

More from Robbia Rana (20)

Recently uploaded

Recently uploaded (20)

Projection Size Disk Reserves Sailors