SlideShare a Scribd company logo
1 of 16
Download to read offline
A Parallel GPU Version
of the Traveling
Salesman Problem
By Molly A. O’Neil, Dan Tamir and Martin Burtscher
Presented By
Rukshan Siriwardhane (148208V)
Vimukthi Wickramasinghe (148245F)
Outline
● The Travelling Salesman Problem
● The TSP algorithm used
● Using a GPU to solve TSP
● Optimizations used
● Evaluation method
● Results
The Traveling Salesman Problem
Defn
- Given n cities, find the shortest Hamiltonian tour
between the cities
● Combinatorial optimization problem
○ Eg: Finding effective drilling arm movement, best routing, logistics etc.
● A brute force search in the solution space is not feasible
● Usually expressed as a graph problem
○ Complete, undirected, planar, Euclidean graph is used
○ Vertices represent cities
○ Edge weights reflect distances or costs
● Optimal solution is NP-hard
○ Heuristic algorithms used to find an approximate solution.
● Here an iterative hill climbing search algorithm is used
○ Generate k random initial tours (k climbers)
○ Iteratively refine them until local minimum reached
● In each iteration, apply best opt-2 move
○ Find best pair of edges (a, b) and (c, d)
such that replacing them with (a,d)
and (b, c) minimizes tour length
The TSP Algorithm used
The TSP Algorithm used
Using a GPU to solve TSP
Parallelism Memory access
regularity
Code regularity Data reuse
More than 10,000
threads
Sets of 32 threads
needs to have
good access to
memory
Sets of 32 threads
need to follow the
same control flow
At least O(n2
)
operations on
O(n) data
Using a GPU to solve TSP
▪ Assuming 100-city problems & 100,000 climbers
▪ Climbers are independent, can be run in parallel
▪ Pro - Plenty of data parallelism
▪ Con - Potential load imbalance
▪ Different number of steps required to reach local minimum
▪ Every step determines best of 4851 opt-2 moves
▪ Same control flow (but different data)
▪ Coalesced memory access patterns
▪ O(n2
) operations on O(n) data
Optimizations - code
● Main code section: finding best opt-2 move
○ Doubly nested loop
■ Only computes difference in tour length, not absolute length
○ Highly optimized to minimize memory accesses
■ “Caches” rest of data in registers
■ Requires only 6 clock cycles per move on a Xeon CPU core
○ Local minimum compared to best solution so far
■ Best solution updated if needed, otherwise tour is discarded
○ Other small optimizations
Optimizations - GPU
● Random tours generated in parallel on GPU
○ Minimizes data transfer to GPU
● 2D distance matrix resident in shared memory
○ Ensures hits in software-controlled fast data cache
● Tours copied to local memory in chunks of 1024
○ Enables accessing them with coalesced loads & stores
Evaluation Method
● Hardware
○ NVIDIA Tesla C2050 GPU
○ (1.15 GHz 14 SMs w/ 32 PEs, 3GB global memory)
○ Nautilus supercomputer (2.0 GHz 8-core X7550 Xeons, sharing 4TB main
memory)
● Data
○ Five 100-city inputs from TSPLIB
● Implementations
○ CUDA (GPU), Pthreads (CPU), serial C (CPU)
○ Use almost identical code for finding best opt-2 move
Results - Runtime Comparison
● GPU is 7.8x faster than CPU with 8 cores
● One GPU chip is as fast as 16 or 32 CPU chips
Speedup over Serial
● Pthreads code scales well up to 32 threads (4 CPUs)
● CPU performance fluctuates (NUMA), GPU stable
Results - Solution Quality
● Optimal tour found in 4 of 5 cases with 100,000 climbers
○ 200,000 climbers find best solution in fifth case
● Runtime independent of input and linear in climbers
Summary
▪ TSP_GPU algorithm
▪ Highly optimized implementation for GPUs
▪ Evaluates almost 20 billion tour modifications per
second on a single GPU (as fast as 32 8-core Xeons)
▪ Produces high-quality results
▪ May be better suited for GPU than Ant Colony
Optimization and GAs.
Any Questions?
Thank You..

More Related Content

What's hot

Graph 500 DISLIB powered optimized version
Graph 500 DISLIB powered optimized versionGraph 500 DISLIB powered optimized version
Graph 500 DISLIB powered optimized versionAnton Korzh
 
Meteo I/O Introduction
Meteo I/O IntroductionMeteo I/O Introduction
Meteo I/O IntroductionRiccardo Rigon
 
Introduction to Date and Time API 3
Introduction to Date and Time API 3Introduction to Date and Time API 3
Introduction to Date and Time API 3Kenji HASUNUMA
 
Log Event Stream Processing In Flink Way
Log Event Stream Processing In Flink WayLog Event Stream Processing In Flink Way
Log Event Stream Processing In Flink WayGeorge T. C. Lai
 
Quantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AIQuantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AISasha Lazarevic
 
Rtl design optimizations and tradeoffs
Rtl design optimizations and tradeoffsRtl design optimizations and tradeoffs
Rtl design optimizations and tradeoffsGrace Abraham
 
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...Filipo Mór
 
Advanced Tracing features using GDB and LTTng
Advanced Tracing features using GDB and LTTngAdvanced Tracing features using GDB and LTTng
Advanced Tracing features using GDB and LTTngmarckhouzam
 
Peer sim (p2p network)
Peer sim (p2p network)Peer sim (p2p network)
Peer sim (p2p network)Hein Min Htike
 

What's hot (13)

Graph 500 DISLIB powered optimized version
Graph 500 DISLIB powered optimized versionGraph 500 DISLIB powered optimized version
Graph 500 DISLIB powered optimized version
 
Meteo I/O Introduction
Meteo I/O IntroductionMeteo I/O Introduction
Meteo I/O Introduction
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
Introduction to Date and Time API 3
Introduction to Date and Time API 3Introduction to Date and Time API 3
Introduction to Date and Time API 3
 
Slides meyer116
Slides meyer116Slides meyer116
Slides meyer116
 
Log Event Stream Processing In Flink Way
Log Event Stream Processing In Flink WayLog Event Stream Processing In Flink Way
Log Event Stream Processing In Flink Way
 
Quantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AIQuantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AI
 
Rtl design optimizations and tradeoffs
Rtl design optimizations and tradeoffsRtl design optimizations and tradeoffs
Rtl design optimizations and tradeoffs
 
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
 
Public Wi-Fi
Public Wi-FiPublic Wi-Fi
Public Wi-Fi
 
OSM Cycle Map
OSM Cycle MapOSM Cycle Map
OSM Cycle Map
 
Advanced Tracing features using GDB and LTTng
Advanced Tracing features using GDB and LTTngAdvanced Tracing features using GDB and LTTng
Advanced Tracing features using GDB and LTTng
 
Peer sim (p2p network)
Peer sim (p2p network)Peer sim (p2p network)
Peer sim (p2p network)
 

Similar to A parallel gpu version of the traveling salesman problem slides

Cs403 Parellel Programming Travelling Salesman Problem
Cs403   Parellel Programming Travelling Salesman ProblemCs403   Parellel Programming Travelling Salesman Problem
Cs403 Parellel Programming Travelling Salesman ProblemJishnu P
 
Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24Aritra Sarkar
 
진동데이터 활용 충돌체 탐지 AI 경진대회 1등
진동데이터 활용 충돌체 탐지 AI 경진대회 1등진동데이터 활용 충돌체 탐지 AI 경진대회 1등
진동데이터 활용 충돌체 탐지 AI 경진대회 1등DACON AI 데이콘
 
Terrain Rendering using GPU-Based Geometry Clipmaps
Terrain Rendering usingGPU-Based Geometry ClipmapsTerrain Rendering usingGPU-Based Geometry Clipmaps
Terrain Rendering using GPU-Based Geometry Clipmapsnone299359
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Golinuxlab_conf
 
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine TranslationHayahide Yamagishi
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelPerformance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelKoichi Shirahata
 
Robotics: Vision-Aided Navigation and Motion Path Planning on Low-End Android...
Robotics: Vision-Aided Navigation and Motion Path Planning on Low-End Android...Robotics: Vision-Aided Navigation and Motion Path Planning on Low-End Android...
Robotics: Vision-Aided Navigation and Motion Path Planning on Low-End Android...Nevada County Tech Connection
 
Prediction of taxi rides ETA
Prediction of taxi rides ETAPrediction of taxi rides ETA
Prediction of taxi rides ETADaniel Marcous
 
Comparing pregel related systems
Comparing pregel related systemsComparing pregel related systems
Comparing pregel related systemsPrashant Raaghav
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010John Holden
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
Adaptive indexing throttling
Adaptive indexing throttling Adaptive indexing throttling
Adaptive indexing throttling Arpit Jain
 
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Tokyo Institute of Technology
 
SSCCIP Final Presentation (The Spartans)
SSCCIP Final Presentation (The Spartans) SSCCIP Final Presentation (The Spartans)
SSCCIP Final Presentation (The Spartans) Derek J. Russell
 

Similar to A parallel gpu version of the traveling salesman problem slides (20)

Cs403 Parellel Programming Travelling Salesman Problem
Cs403   Parellel Programming Travelling Salesman ProblemCs403   Parellel Programming Travelling Salesman Problem
Cs403 Parellel Programming Travelling Salesman Problem
 
Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24
 
진동데이터 활용 충돌체 탐지 AI 경진대회 1등
진동데이터 활용 충돌체 탐지 AI 경진대회 1등진동데이터 활용 충돌체 탐지 AI 경진대회 1등
진동데이터 활용 충돌체 탐지 AI 경진대회 1등
 
Travelling salesman problem
Travelling salesman problemTravelling salesman problem
Travelling salesman problem
 
Terrain Rendering using GPU-Based Geometry Clipmaps
Terrain Rendering usingGPU-Based Geometry ClipmapsTerrain Rendering usingGPU-Based Geometry Clipmaps
Terrain Rendering using GPU-Based Geometry Clipmaps
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Go
 
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelPerformance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
 
Robotics: Vision-Aided Navigation and Motion Path Planning on Low-End Android...
Robotics: Vision-Aided Navigation and Motion Path Planning on Low-End Android...Robotics: Vision-Aided Navigation and Motion Path Planning on Low-End Android...
Robotics: Vision-Aided Navigation and Motion Path Planning on Low-End Android...
 
Esd module2
Esd module2Esd module2
Esd module2
 
Aa sort-v4
Aa sort-v4Aa sort-v4
Aa sort-v4
 
Prediction of taxi rides ETA
Prediction of taxi rides ETAPrediction of taxi rides ETA
Prediction of taxi rides ETA
 
Comparing pregel related systems
Comparing pregel related systemsComparing pregel related systems
Comparing pregel related systems
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Adaptive indexing throttling
Adaptive indexing throttling Adaptive indexing throttling
Adaptive indexing throttling
 
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
 
Multicore architectures
Multicore architecturesMulticore architectures
Multicore architectures
 
SSCCIP Final Presentation (The Spartans)
SSCCIP Final Presentation (The Spartans) SSCCIP Final Presentation (The Spartans)
SSCCIP Final Presentation (The Spartans)
 

More from Vimukthi Wickramasinghe

Exploring Strategies for Training Deep Neural Networks paper review
Exploring Strategies for Training Deep Neural Networks paper reviewExploring Strategies for Training Deep Neural Networks paper review
Exploring Strategies for Training Deep Neural Networks paper reviewVimukthi Wickramasinghe
 
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Vimukthi Wickramasinghe
 
Application Performance & Flexibility on Exokernel Systems paper review
Application Performance & Flexibility on Exokernel Systems paper reviewApplication Performance & Flexibility on Exokernel Systems paper review
Application Performance & Flexibility on Exokernel Systems paper reviewVimukthi Wickramasinghe
 
Improved Query Performance With Variant Indexes - review presentation
Improved Query Performance With Variant Indexes - review presentationImproved Query Performance With Variant Indexes - review presentation
Improved Query Performance With Variant Indexes - review presentationVimukthi Wickramasinghe
 

More from Vimukthi Wickramasinghe (8)

Beanstalkg
BeanstalkgBeanstalkg
Beanstalkg
 
pgdip-project-report-final-148245F
pgdip-project-report-final-148245Fpgdip-project-report-final-148245F
pgdip-project-report-final-148245F
 
Factored Operating Systems paper review
Factored Operating Systems paper reviewFactored Operating Systems paper review
Factored Operating Systems paper review
 
Exploring Strategies for Training Deep Neural Networks paper review
Exploring Strategies for Training Deep Neural Networks paper reviewExploring Strategies for Training Deep Neural Networks paper review
Exploring Strategies for Training Deep Neural Networks paper review
 
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
 
Application Performance & Flexibility on Exokernel Systems paper review
Application Performance & Flexibility on Exokernel Systems paper reviewApplication Performance & Flexibility on Exokernel Systems paper review
Application Performance & Flexibility on Exokernel Systems paper review
 
Improved Query Performance With Variant Indexes - review presentation
Improved Query Performance With Variant Indexes - review presentationImproved Query Performance With Variant Indexes - review presentation
Improved Query Performance With Variant Indexes - review presentation
 
Smart mrs bi project-presentation
Smart mrs bi project-presentationSmart mrs bi project-presentation
Smart mrs bi project-presentation
 

Recently uploaded

Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxVelmuruganTECE
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsDILIPKUMARMONDAL6
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...Amil Baba Dawood bangali
 

Recently uploaded (20)

Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptx
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teams
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
 

A parallel gpu version of the traveling salesman problem slides

  • 1. A Parallel GPU Version of the Traveling Salesman Problem By Molly A. O’Neil, Dan Tamir and Martin Burtscher Presented By Rukshan Siriwardhane (148208V) Vimukthi Wickramasinghe (148245F)
  • 2. Outline ● The Travelling Salesman Problem ● The TSP algorithm used ● Using a GPU to solve TSP ● Optimizations used ● Evaluation method ● Results
  • 3. The Traveling Salesman Problem Defn - Given n cities, find the shortest Hamiltonian tour between the cities ● Combinatorial optimization problem ○ Eg: Finding effective drilling arm movement, best routing, logistics etc. ● A brute force search in the solution space is not feasible ● Usually expressed as a graph problem ○ Complete, undirected, planar, Euclidean graph is used ○ Vertices represent cities ○ Edge weights reflect distances or costs
  • 4. ● Optimal solution is NP-hard ○ Heuristic algorithms used to find an approximate solution. ● Here an iterative hill climbing search algorithm is used ○ Generate k random initial tours (k climbers) ○ Iteratively refine them until local minimum reached ● In each iteration, apply best opt-2 move ○ Find best pair of edges (a, b) and (c, d) such that replacing them with (a,d) and (b, c) minimizes tour length The TSP Algorithm used
  • 6. Using a GPU to solve TSP Parallelism Memory access regularity Code regularity Data reuse More than 10,000 threads Sets of 32 threads needs to have good access to memory Sets of 32 threads need to follow the same control flow At least O(n2 ) operations on O(n) data
  • 7. Using a GPU to solve TSP ▪ Assuming 100-city problems & 100,000 climbers ▪ Climbers are independent, can be run in parallel ▪ Pro - Plenty of data parallelism ▪ Con - Potential load imbalance ▪ Different number of steps required to reach local minimum ▪ Every step determines best of 4851 opt-2 moves ▪ Same control flow (but different data) ▪ Coalesced memory access patterns ▪ O(n2 ) operations on O(n) data
  • 8. Optimizations - code ● Main code section: finding best opt-2 move ○ Doubly nested loop ■ Only computes difference in tour length, not absolute length ○ Highly optimized to minimize memory accesses ■ “Caches” rest of data in registers ■ Requires only 6 clock cycles per move on a Xeon CPU core ○ Local minimum compared to best solution so far ■ Best solution updated if needed, otherwise tour is discarded ○ Other small optimizations
  • 9. Optimizations - GPU ● Random tours generated in parallel on GPU ○ Minimizes data transfer to GPU ● 2D distance matrix resident in shared memory ○ Ensures hits in software-controlled fast data cache ● Tours copied to local memory in chunks of 1024 ○ Enables accessing them with coalesced loads & stores
  • 10. Evaluation Method ● Hardware ○ NVIDIA Tesla C2050 GPU ○ (1.15 GHz 14 SMs w/ 32 PEs, 3GB global memory) ○ Nautilus supercomputer (2.0 GHz 8-core X7550 Xeons, sharing 4TB main memory) ● Data ○ Five 100-city inputs from TSPLIB ● Implementations ○ CUDA (GPU), Pthreads (CPU), serial C (CPU) ○ Use almost identical code for finding best opt-2 move
  • 11. Results - Runtime Comparison ● GPU is 7.8x faster than CPU with 8 cores ● One GPU chip is as fast as 16 or 32 CPU chips
  • 12. Speedup over Serial ● Pthreads code scales well up to 32 threads (4 CPUs) ● CPU performance fluctuates (NUMA), GPU stable
  • 13. Results - Solution Quality ● Optimal tour found in 4 of 5 cases with 100,000 climbers ○ 200,000 climbers find best solution in fifth case ● Runtime independent of input and linear in climbers
  • 14. Summary ▪ TSP_GPU algorithm ▪ Highly optimized implementation for GPUs ▪ Evaluates almost 20 billion tour modifications per second on a single GPU (as fast as 32 8-core Xeons) ▪ Produces high-quality results ▪ May be better suited for GPU than Ant Colony Optimization and GAs.