SlideShare a Scribd company logo
1 of 28
Search Optimization using GPU
February , 2010
J.Pavan Kumar
T.Polender Reddy
G.Sravan Kumar
T.Raghava Jyothi
Agenda
• Introduction
▫ History
▫ Why ‘String Searching’ ?
▫ Generations of Sequential search.
▫ Searching through GPU approach.
▫ Architecture of a GPU System.
▫ CPU vs.GPU.
▫ Features.
▫ Algorithms Implementation-Output analysis
September 28, 2016 2
History
 1980’s – No GPU. PC used VGA controller.
 1990’s – Add more function into VGA controller.
 1997 – 3D acceleration functions:
a) Hardware for triangle setup and rasterization.
b) Texture mapping.
c) Shading.
 2000 – A single chip graphics processor ( beginning
of GPU term).
 2005 – Massively parallel programmable
processors.
 2007 – CUDA (Compute Unified Device
Architecture).
September 28, 2016 3
GPU Computing Application Areas
September 28, 2016 4
Why ‘String Searching’ ?
September 28, 2016 5
ATGCATGCATGACTAG
AGCTAGA
ATGCATGCATGACTAG
ATGCATGCATGACTAG
ATGCATGCATGACTAG
ATGCATGCATGACTAG
ATGCATGCATGACTAG
•String-matching is a
very important subject
in the wider domain of
text processing.
•String-matching
algorithms are basic
components used in
implementations of
practical software's
existing under most
operating systems.
Generations
CPUCPU
SequentialSequential
Message PassingMessage Passing
Interface(MPI)Interface(MPI)
Open MPIOpen MPI
PThreadsPThreads
MultiMulti CoreCore
Single CoreSingle Core
Sequential Search Pain Points
Performance
Power
Consumption
Heat
Generation
Accuracy Time
Complexity
GPU Vision
Researchers
Customers
Business Applications
(Programming, Gaming, custom . . .)
•Computer vision tasks are computationally intensive
and repetitive, and they often exceed the real-time
capabilities of the CPU, leaving little time for higher-
level tasks.
•However, many computer vision operations map efficiently onto the modern GPU,
whose programmability allows a wide variety of computer vision algorithms to be
implemented
Technology Convergence
Single Core SystemsSingle Core SystemsSequential Searching “CPU”Sequential Searching “CPU”
MultiCore SystemsMultiCore Systems
Search using “Pthreads”Search using “Pthreads”
Search using “MPI” & “Open MP”Search using “MPI” & “Open MP”
CUDA ArchitectureCUDA Architecture
Graphics Processing Unit (GPU)Graphics Processing Unit (GPU)
Portal, Virtual, and MorePortal, Virtual, and More
2005
2007
2000
What is GPU?
• “Graphics Processing Unit ” , it is a processor
optimized for 2D/3D graphics, video, visual computing,
and display.
• It is highly parallel, highly multithreaded multiprocessor
optimized for visual computing.
• It provide real-time visual interaction with
computed objects via graphics images, and video.
• It serves as both a programmable graphics processor
and a scalable parallel computing platform.
• Heterogeneous Systems: combine a GPU with a
CPU.
September 28, 2016 10
Compute Unified Device Architecture
September 28, 2016 11
•CUDA is a scalable parallel programming
model and a software environment for
parallel computing.
• Minimal extensions to familiar C/C++
environment.
• Heterogeneous serial-parallel
programming model.
• CUDA also maps well to MultiCore CPUs!
CUDA C Example: Add Arrays
September 28, 2016 12
Logical Architecture of a GPU System
September 28, 2016 13
Physical Architecture
September 28, 2016 14
StrengthsStrengths
Fast & EasyFast & Easy
Network considerationsNetwork considerations
LimitationsLimitations
Limited data & user loadLimited data & user load
capability,capability, Memory
Bandwidth
Availability & reliabilityAvailability & reliability
GPU
Availability
Performance
Resource requirements:
5KB of SMEM per block
30 registers
128 threads per block.
Sequential CPU vs. Graphics Processing Unit
(GPU)
CPU
GPU
• Optimized for low-
latency access to cached
data sets.
• Control logic for out-of-
order and speculative
execution
• Optimized for data-
parallel, throughput
computation
• Architecture tolerant
of memory latency.
• More transistors
dedicated to
Computation
September 28, 2016 15
Testing - Matrices
• Test the multiplication of two matrices.
• Creates two matrices with random floating point
values.
• We tested with matrices of various dimensions…
September 28, 2016 16
Results:
DimensionTime CUDA(GPU) CPU
64x64 0.417465 ms 18.0876 ms
128x128 0.41691 ms 18.3007 ms
256x256 2.146367 ms 145.6302 ms
512x512 8.093004 ms 1494.7275 ms
768x768 25.97624 ms 4866.3246 ms
1024x1024 52.42811 ms 66097.1688 ms
2048x2048 407.648 ms No Results
4096x4096 3.1 seconds No Results
September 28, 2016 17
Features
• Random access to memory
• Thread can access any memory location
• Unlimited access to memory
• Thread can read/write as many locations as needed
• User-managed cache (per block)
• Threads can cooperatively load data into SMEM
• Any thread can then access any SMEM location
• Low learning curve
• Just a few extensions to C
• No knowledge of graphics is required
• No graphics API overhead
September 28, 2016 18
Brute Force Algorithm
• No preprocessing phase.
• Constant extra space needed.
• Always shifts the window by exactly 1 position to the
right.
• Comparisons can be done in any order.
• Searching phase in O(mn) time complexity.
• 2n expected text characters comparisons.
September 28, 2016 19
Example
September 28, 2016 20
Text
Search Pattern
Boyer-Moore Algorithm
• Performs the comparisons from right to left.
• Preprocessing phase in O(m+) time and space
complexity.
• Searching phase in O(mn) time complexity.
• 3n text character comparisons in the worst case
when searching for a non periodic pattern.
• O(n / m) best performance.
September 28, 2016 21
Example
September 28, 2016 22
Text
Search Pattern
Results: To search Four length string
Data
SizeTime
Brute Force
GPU
Brute Force
CPU
Boyer-Moore
GPU
Boyer-
Moore CPU
64 MB 135.48 sec 145.7 sec 219.35 sec 208.76
128 MB 279.86 sec 327 sec 388.69 sec 382.7 sec
256 MB 520.55 sec 575.42 sec 759.52 sec 753.21 sec
512 MB 1045.33 sec 1133 sec 970.56 sec 1520.71 sec
768 MB 1258.67 sec 1689 sec 1105.6 sec 2258.4 sec
1 GB 1389.96 sec 1896.45 sec 1506.34 sec 3018.2 sec
September 28, 2016 23
Graphs: Brute Force
September 28, 2016 24
Graph 1 Graph 2
Graphs: Boyer-Moore
September 28, 2016 25
Graph 1 Graph 2
Conclusion
• GPU computing has gathered tremendous interest
as a solution that is able to handle complex
computational problems and massive data sets in
real time.
• At the core of GPU computing, the C-based
Integrated Development Environment, CUDA, has
become the quintessential piece to the next
generations of application development.
September 28, 2016 26
Questions
September 28, 2016 27
Thank You

More Related Content

What's hot

StatsD Workshop Monitorama 2013
StatsD Workshop Monitorama 2013StatsD Workshop Monitorama 2013
StatsD Workshop Monitorama 2013Daniel Schauenberg
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with GpuRohit Khatana
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overviewRajiv Kumar
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalJunli Gu
 
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...Josef A. Habdank
 
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion NetworkTraffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Networkivaderivader
 
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Junli Gu
 
[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...
[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...
[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...PROIDEA
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An IntroductionDhan V Sagar
 
Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Slides for In-Datacenter Performance Analysis of a Tensor Processing UnitSlides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Slides for In-Datacenter Performance Analysis of a Tensor Processing UnitCarlo C. del Mundo
 
A New Approach for Parallel Region Growing Algorithm in Image Segmentation u...
A New Approach for Parallel Region Growing Algorithm in Image Segmentation u...A New Approach for Parallel Region Growing Algorithm in Image Segmentation u...
A New Approach for Parallel Region Growing Algorithm in Image Segmentation u...maneesh boddu
 

What's hot (16)

OSM Cycle Map
OSM Cycle MapOSM Cycle Map
OSM Cycle Map
 
StatsD Workshop Monitorama 2013
StatsD Workshop Monitorama 2013StatsD Workshop Monitorama 2013
StatsD Workshop Monitorama 2013
 
Making data storage more efficient
Making data storage more efficientMaking data storage more efficient
Making data storage more efficient
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overview
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation final
 
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
 
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion NetworkTraffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
 
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
 
[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...
[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...
[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An Introduction
 
Google TPU
Google TPUGoogle TPU
Google TPU
 
Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Slides for In-Datacenter Performance Analysis of a Tensor Processing UnitSlides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit
 
2017 04-13-google-tpu-04
2017 04-13-google-tpu-042017 04-13-google-tpu-04
2017 04-13-google-tpu-04
 
A New Approach for Parallel Region Growing Algorithm in Image Segmentation u...
A New Approach for Parallel Region Growing Algorithm in Image Segmentation u...A New Approach for Parallel Region Growing Algorithm in Image Segmentation u...
A New Approach for Parallel Region Growing Algorithm in Image Segmentation u...
 

Similar to GPU_based Searching

Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...Matthias Trapp
 
Cd general presentation_201306_eng_03
Cd general presentation_201306_eng_03Cd general presentation_201306_eng_03
Cd general presentation_201306_eng_03Victor Mitov
 
Introduction to GPUs for Machine Learning
Introduction to GPUs for Machine LearningIntroduction to GPUs for Machine Learning
Introduction to GPUs for Machine LearningSri Ambati
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit pptSandeep Singh
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Databricks
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningDatabricks
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Ahsan Javed Awan
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...Edge AI and Vision Alliance
 
PGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC 2018 - PostgreSQL performance comparison in various cloudsPGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC 2018 - PostgreSQL performance comparison in various cloudsPGConf APAC
 
GPU-Accelerating A Deep Learning Anomaly Detection Platform
GPU-Accelerating A Deep Learning Anomaly Detection PlatformGPU-Accelerating A Deep Learning Anomaly Detection Platform
GPU-Accelerating A Deep Learning Anomaly Detection PlatformNVIDIA
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS An efficient-parallel-approach-fo...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS  An efficient-parallel-approach-fo...IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS  An efficient-parallel-approach-fo...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS An efficient-parallel-approach-fo...IEEEBEBTECHSTUDENTPROJECTS
 
[db analytics showcase Sapporo 2018] B33 H2O4GPU and GoAI: harnessing the pow...
[db analytics showcase Sapporo 2018] B33 H2O4GPU and GoAI: harnessing the pow...[db analytics showcase Sapporo 2018] B33 H2O4GPU and GoAI: harnessing the pow...
[db analytics showcase Sapporo 2018] B33 H2O4GPU and GoAI: harnessing the pow...Insight Technology, Inc.
 

Similar to GPU_based Searching (20)

Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...
 
E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
Cd general presentation_201306_eng_03
Cd general presentation_201306_eng_03Cd general presentation_201306_eng_03
Cd general presentation_201306_eng_03
 
dev_int_96
dev_int_96dev_int_96
dev_int_96
 
Introduction to GPUs for Machine Learning
Introduction to GPUs for Machine LearningIntroduction to GPUs for Machine Learning
Introduction to GPUs for Machine Learning
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
GPU - Basic Working
GPU - Basic WorkingGPU - Basic Working
GPU - Basic Working
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
 
PREDIcT
PREDIcTPREDIcT
PREDIcT
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
PGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC 2018 - PostgreSQL performance comparison in various cloudsPGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC 2018 - PostgreSQL performance comparison in various clouds
 
GPU-Accelerating A Deep Learning Anomaly Detection Platform
GPU-Accelerating A Deep Learning Anomaly Detection PlatformGPU-Accelerating A Deep Learning Anomaly Detection Platform
GPU-Accelerating A Deep Learning Anomaly Detection Platform
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS An efficient-parallel-approach-fo...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS  An efficient-parallel-approach-fo...IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS  An efficient-parallel-approach-fo...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS An efficient-parallel-approach-fo...
 
[db analytics showcase Sapporo 2018] B33 H2O4GPU and GoAI: harnessing the pow...
[db analytics showcase Sapporo 2018] B33 H2O4GPU and GoAI: harnessing the pow...[db analytics showcase Sapporo 2018] B33 H2O4GPU and GoAI: harnessing the pow...
[db analytics showcase Sapporo 2018] B33 H2O4GPU and GoAI: harnessing the pow...
 
GPU Algorithms and trends 2018
GPU Algorithms and trends 2018GPU Algorithms and trends 2018
GPU Algorithms and trends 2018
 

GPU_based Searching

  • 1. Search Optimization using GPU February , 2010 J.Pavan Kumar T.Polender Reddy G.Sravan Kumar T.Raghava Jyothi
  • 2. Agenda • Introduction ▫ History ▫ Why ‘String Searching’ ? ▫ Generations of Sequential search. ▫ Searching through GPU approach. ▫ Architecture of a GPU System. ▫ CPU vs.GPU. ▫ Features. ▫ Algorithms Implementation-Output analysis September 28, 2016 2
  • 3. History  1980’s – No GPU. PC used VGA controller.  1990’s – Add more function into VGA controller.  1997 – 3D acceleration functions: a) Hardware for triangle setup and rasterization. b) Texture mapping. c) Shading.  2000 – A single chip graphics processor ( beginning of GPU term).  2005 – Massively parallel programmable processors.  2007 – CUDA (Compute Unified Device Architecture). September 28, 2016 3
  • 4. GPU Computing Application Areas September 28, 2016 4
  • 5. Why ‘String Searching’ ? September 28, 2016 5 ATGCATGCATGACTAG AGCTAGA ATGCATGCATGACTAG ATGCATGCATGACTAG ATGCATGCATGACTAG ATGCATGCATGACTAG ATGCATGCATGACTAG •String-matching is a very important subject in the wider domain of text processing. •String-matching algorithms are basic components used in implementations of practical software's existing under most operating systems.
  • 6. Generations CPUCPU SequentialSequential Message PassingMessage Passing Interface(MPI)Interface(MPI) Open MPIOpen MPI PThreadsPThreads MultiMulti CoreCore Single CoreSingle Core
  • 7. Sequential Search Pain Points Performance Power Consumption Heat Generation Accuracy Time Complexity
  • 8. GPU Vision Researchers Customers Business Applications (Programming, Gaming, custom . . .) •Computer vision tasks are computationally intensive and repetitive, and they often exceed the real-time capabilities of the CPU, leaving little time for higher- level tasks. •However, many computer vision operations map efficiently onto the modern GPU, whose programmability allows a wide variety of computer vision algorithms to be implemented
  • 9. Technology Convergence Single Core SystemsSingle Core SystemsSequential Searching “CPU”Sequential Searching “CPU” MultiCore SystemsMultiCore Systems Search using “Pthreads”Search using “Pthreads” Search using “MPI” & “Open MP”Search using “MPI” & “Open MP” CUDA ArchitectureCUDA Architecture Graphics Processing Unit (GPU)Graphics Processing Unit (GPU) Portal, Virtual, and MorePortal, Virtual, and More 2005 2007 2000
  • 10. What is GPU? • “Graphics Processing Unit ” , it is a processor optimized for 2D/3D graphics, video, visual computing, and display. • It is highly parallel, highly multithreaded multiprocessor optimized for visual computing. • It provide real-time visual interaction with computed objects via graphics images, and video. • It serves as both a programmable graphics processor and a scalable parallel computing platform. • Heterogeneous Systems: combine a GPU with a CPU. September 28, 2016 10
  • 11. Compute Unified Device Architecture September 28, 2016 11 •CUDA is a scalable parallel programming model and a software environment for parallel computing. • Minimal extensions to familiar C/C++ environment. • Heterogeneous serial-parallel programming model. • CUDA also maps well to MultiCore CPUs!
  • 12. CUDA C Example: Add Arrays September 28, 2016 12
  • 13. Logical Architecture of a GPU System September 28, 2016 13
  • 14. Physical Architecture September 28, 2016 14 StrengthsStrengths Fast & EasyFast & Easy Network considerationsNetwork considerations LimitationsLimitations Limited data & user loadLimited data & user load capability,capability, Memory Bandwidth Availability & reliabilityAvailability & reliability GPU Availability Performance Resource requirements: 5KB of SMEM per block 30 registers 128 threads per block.
  • 15. Sequential CPU vs. Graphics Processing Unit (GPU) CPU GPU • Optimized for low- latency access to cached data sets. • Control logic for out-of- order and speculative execution • Optimized for data- parallel, throughput computation • Architecture tolerant of memory latency. • More transistors dedicated to Computation September 28, 2016 15
  • 16. Testing - Matrices • Test the multiplication of two matrices. • Creates two matrices with random floating point values. • We tested with matrices of various dimensions… September 28, 2016 16
  • 17. Results: DimensionTime CUDA(GPU) CPU 64x64 0.417465 ms 18.0876 ms 128x128 0.41691 ms 18.3007 ms 256x256 2.146367 ms 145.6302 ms 512x512 8.093004 ms 1494.7275 ms 768x768 25.97624 ms 4866.3246 ms 1024x1024 52.42811 ms 66097.1688 ms 2048x2048 407.648 ms No Results 4096x4096 3.1 seconds No Results September 28, 2016 17
  • 18. Features • Random access to memory • Thread can access any memory location • Unlimited access to memory • Thread can read/write as many locations as needed • User-managed cache (per block) • Threads can cooperatively load data into SMEM • Any thread can then access any SMEM location • Low learning curve • Just a few extensions to C • No knowledge of graphics is required • No graphics API overhead September 28, 2016 18
  • 19. Brute Force Algorithm • No preprocessing phase. • Constant extra space needed. • Always shifts the window by exactly 1 position to the right. • Comparisons can be done in any order. • Searching phase in O(mn) time complexity. • 2n expected text characters comparisons. September 28, 2016 19
  • 20. Example September 28, 2016 20 Text Search Pattern
  • 21. Boyer-Moore Algorithm • Performs the comparisons from right to left. • Preprocessing phase in O(m+) time and space complexity. • Searching phase in O(mn) time complexity. • 3n text character comparisons in the worst case when searching for a non periodic pattern. • O(n / m) best performance. September 28, 2016 21
  • 22. Example September 28, 2016 22 Text Search Pattern
  • 23. Results: To search Four length string Data SizeTime Brute Force GPU Brute Force CPU Boyer-Moore GPU Boyer- Moore CPU 64 MB 135.48 sec 145.7 sec 219.35 sec 208.76 128 MB 279.86 sec 327 sec 388.69 sec 382.7 sec 256 MB 520.55 sec 575.42 sec 759.52 sec 753.21 sec 512 MB 1045.33 sec 1133 sec 970.56 sec 1520.71 sec 768 MB 1258.67 sec 1689 sec 1105.6 sec 2258.4 sec 1 GB 1389.96 sec 1896.45 sec 1506.34 sec 3018.2 sec September 28, 2016 23
  • 24. Graphs: Brute Force September 28, 2016 24 Graph 1 Graph 2
  • 25. Graphs: Boyer-Moore September 28, 2016 25 Graph 1 Graph 2
  • 26. Conclusion • GPU computing has gathered tremendous interest as a solution that is able to handle complex computational problems and massive data sets in real time. • At the core of GPU computing, the C-based Integrated Development Environment, CUDA, has become the quintessential piece to the next generations of application development. September 28, 2016 26