SlideShare a Scribd company logo
1 of 13
SORTING LARGE DATA Harish Chetty, Brian Finelli
and Jacob Kattampilly
MODIFIED MERGE SORT
We have modified the merge sort to gain more fine grained control
over the different levels at which merge is performed.
E.g. We first separate the data into 9 parts so that each server does
some of the sorting individually and we then merge all 9 parts in the
final step using a 9 way merge in order to get the final sorted data.
SERVER DIVISION OF DATA
Server
Client
1
Client
2
Client
3
Client
4
Client
5
Client
6
Client
7
Client
8
SERVER DIVISION OF DATA
Server
Client
1
Client
2
Client
3
Client
4
Client
5
Client
6
Client
7
Client
8
Server: Sends 1/9th
of the data to each
of the clients. And
also sorts 1/9th of
the data itself. Thus
behaving as a client
itself.
Client: Each client
sorts 1/9th of the
data and returns it
back to the server.
The client and
server are
connected via TCP.
CLIENT DIVISION OF DATA
Client
Data
Client
Data 1
Thread 1 Thread 2 … Thread 16
Client
Data2
Thread 1 Thread 2 … Thread 16
The client then divides data into 2 parts to
eliminate memory wastage.
Data1 Data2
CLIENT DIVISION OF DATA
Client
Data
Client
Data 1
Thread 1 Thread 2 … Thread 16
Client
Data2
Thread 1 Thread 2 … Thread 16
Parallel sorting of chunks on
multiple cores
Parallel sorting of chunks on
multiple cores
THREAD SORTING
Threadi
Merge Sort
Network Sort
A Combination of Merge Sort and Network Sort is used inside e
Sorts block i
Uses MS if len(blk)
>= 16
else NS
MERGE BETWEEN THREADS
Merg
e
Thread
Thread
Thread
Thread
Data1
Data2
Thread
Data3
Data4
Thread
Thread
Data5
Data6
Thread
Data7
Data8
Thread
Thread
Thread
Data9
Data10
Thread
Data11
Data12
Thread
Thread
Data13
Data14
Thread
Data15
Data16
MERGE BETWEEN THREADS
Thread
Thread
Thread
Thread
Data1
Data2
Thread
Data3
Data4
Thread
Thread
Data5
Data6
Thread
Data7
Data8
Thread
Thread
Thread
Data9
Data10
Thread
Data11
Data12
Thread
Thread
Data13
Data14
Thread
Data15
Data16
Merg
e
The 16 sorted
data blocks of
each thread are
merged
together.
MERGE BETWEEN 2 DATA SETS
WITHIN A CLIENT
We have 2 4 GB data
to be merged so we
will require 8 GB of
temporary space thus
reaching a total of 16
GB which is our ram
capacity.
Trick: Use 4 GB of
temporary space to
store results. The use
one of the data array
to store rest of the
solution.
Data
1
Data
2
Temp
+
Data 1
9 WAY MERGE AT SERVER
Server
Buffer array
TCP/IP
Buffer
Client 1
Buffer array
TCP/IP
Buffer
Client 2
Buffer array
TCP/IP
Buffer
Client 3
Buffer array
TCP/IP
Buffer
Client 4
Buffer array
TCP/IP
Buffer
Client 5
Buffer array
TCP/IP
Buffer
Client 6
Buffer array
TCP/IP
Buffer
Client 7
Buffer array
TCP/IP
Buffer
Client 8
Merge
Data is transmitted in chunks
from the clients to the server in
order to avoid latency due to
network.
9 WAY MERGE AT SERVER (EACH
STEP)
•Check 9 elements. One from server and others from each of the clients.
•Find the minimum of the 9 values.
•Only store the minimum value if it is the 10th item (or multiple of 10) in the
final sorted data.
In this way we completely eliminated all intermediate disk read and writes.
FINAL RESULTS
Best
test0 = 20:16
test1 = 20:48
Average
test0 = ~22-23
test1 = ~22-23
Worst
test0 = ~25-28
test1 = ~25-28

More Related Content

Similar to N way merge sort

Ex 1 chapter06-i-pv4-tony_chen
Ex 1 chapter06-i-pv4-tony_chenEx 1 chapter06-i-pv4-tony_chen
Ex 1 chapter06-i-pv4-tony_chen
Đô GiẢn
 
Custom_IP_Network_Protocol_and_Router
Custom_IP_Network_Protocol_and_RouterCustom_IP_Network_Protocol_and_Router
Custom_IP_Network_Protocol_and_Router
Vishal Vasudev
 

Similar to N way merge sort (20)

Sql Server Best Practices
Sql Server Best PracticesSql Server Best Practices
Sql Server Best Practices
 
Ipv4
Ipv4Ipv4
Ipv4
 
Link i pv4
Link i pv4Link i pv4
Link i pv4
 
IP Header & IP Fragmentation (Network Layer Protocols).pptx
IP Header & IP Fragmentation (Network Layer Protocols).pptxIP Header & IP Fragmentation (Network Layer Protocols).pptx
IP Header & IP Fragmentation (Network Layer Protocols).pptx
 
Single core design space exploration
Single core design space explorationSingle core design space exploration
Single core design space exploration
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
Computer Network - Network Layer
Computer Network - Network LayerComputer Network - Network Layer
Computer Network - Network Layer
 
Computer Network Performance Evaluation Based on Different Data Packet Size U...
Computer Network Performance Evaluation Based on Different Data Packet Size U...Computer Network Performance Evaluation Based on Different Data Packet Size U...
Computer Network Performance Evaluation Based on Different Data Packet Size U...
 
3.7.10 Lab Use Wireshark to View Network Traffic
3.7.10 Lab Use Wireshark to View Network Traffic3.7.10 Lab Use Wireshark to View Network Traffic
3.7.10 Lab Use Wireshark to View Network Traffic
 
Computer Networks 3
Computer Networks 3Computer Networks 3
Computer Networks 3
 
Manjeet Singh.pptx
Manjeet Singh.pptxManjeet Singh.pptx
Manjeet Singh.pptx
 
Nic bonding
Nic bonding Nic bonding
Nic bonding
 
S3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkS3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using spark
 
subnetting
subnettingsubnetting
subnetting
 
474-22-DatagramForwarding.pptx
474-22-DatagramForwarding.pptx474-22-DatagramForwarding.pptx
474-22-DatagramForwarding.pptx
 
TCP/IP 3RD SEM.2012 AUG.ASSIGNMENT
TCP/IP 3RD SEM.2012 AUG.ASSIGNMENTTCP/IP 3RD SEM.2012 AUG.ASSIGNMENT
TCP/IP 3RD SEM.2012 AUG.ASSIGNMENT
 
Spark Performance Tuning .pdf
Spark Performance Tuning .pdfSpark Performance Tuning .pdf
Spark Performance Tuning .pdf
 
Super Computer
Super ComputerSuper Computer
Super Computer
 
Ex 1 chapter06-i-pv4-tony_chen
Ex 1 chapter06-i-pv4-tony_chenEx 1 chapter06-i-pv4-tony_chen
Ex 1 chapter06-i-pv4-tony_chen
 
Custom_IP_Network_Protocol_and_Router
Custom_IP_Network_Protocol_and_RouterCustom_IP_Network_Protocol_and_Router
Custom_IP_Network_Protocol_and_Router
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

N way merge sort