1. Search Optimization using GPU
February , 2010
J.Pavan Kumar
T.Polender Reddy
G.Sravan Kumar
T.Raghava Jyothi
2. Agenda
• Introduction
▫ History
▫ Why ‘String Searching’ ?
▫ Generations of Sequential search.
▫ Searching through GPU approach.
▫ Architecture of a GPU System.
▫ CPU vs.GPU.
▫ Features.
▫ Algorithms Implementation-Output analysis
September 28, 2016 2
3. History
1980’s – No GPU. PC used VGA controller.
1990’s – Add more function into VGA controller.
1997 – 3D acceleration functions:
a) Hardware for triangle setup and rasterization.
b) Texture mapping.
c) Shading.
2000 – A single chip graphics processor ( beginning
of GPU term).
2005 – Massively parallel programmable
processors.
2007 – CUDA (Compute Unified Device
Architecture).
September 28, 2016 3
5. Why ‘String Searching’ ?
September 28, 2016 5
ATGCATGCATGACTAG
AGCTAGA
ATGCATGCATGACTAG
ATGCATGCATGACTAG
ATGCATGCATGACTAG
ATGCATGCATGACTAG
ATGCATGCATGACTAG
•String-matching is a
very important subject
in the wider domain of
text processing.
•String-matching
algorithms are basic
components used in
implementations of
practical software's
existing under most
operating systems.
7. Sequential Search Pain Points
Performance
Power
Consumption
Heat
Generation
Accuracy Time
Complexity
8. GPU Vision
Researchers
Customers
Business Applications
(Programming, Gaming, custom . . .)
•Computer vision tasks are computationally intensive
and repetitive, and they often exceed the real-time
capabilities of the CPU, leaving little time for higher-
level tasks.
•However, many computer vision operations map efficiently onto the modern GPU,
whose programmability allows a wide variety of computer vision algorithms to be
implemented
9. Technology Convergence
Single Core SystemsSingle Core SystemsSequential Searching “CPU”Sequential Searching “CPU”
MultiCore SystemsMultiCore Systems
Search using “Pthreads”Search using “Pthreads”
Search using “MPI” & “Open MP”Search using “MPI” & “Open MP”
CUDA ArchitectureCUDA Architecture
Graphics Processing Unit (GPU)Graphics Processing Unit (GPU)
Portal, Virtual, and MorePortal, Virtual, and More
2005
2007
2000
10. What is GPU?
• “Graphics Processing Unit ” , it is a processor
optimized for 2D/3D graphics, video, visual computing,
and display.
• It is highly parallel, highly multithreaded multiprocessor
optimized for visual computing.
• It provide real-time visual interaction with
computed objects via graphics images, and video.
• It serves as both a programmable graphics processor
and a scalable parallel computing platform.
• Heterogeneous Systems: combine a GPU with a
CPU.
September 28, 2016 10
11. Compute Unified Device Architecture
September 28, 2016 11
•CUDA is a scalable parallel programming
model and a software environment for
parallel computing.
• Minimal extensions to familiar C/C++
environment.
• Heterogeneous serial-parallel
programming model.
• CUDA also maps well to MultiCore CPUs!
14. Physical Architecture
September 28, 2016 14
StrengthsStrengths
Fast & EasyFast & Easy
Network considerationsNetwork considerations
LimitationsLimitations
Limited data & user loadLimited data & user load
capability,capability, Memory
Bandwidth
Availability & reliabilityAvailability & reliability
GPU
Availability
Performance
Resource requirements:
5KB of SMEM per block
30 registers
128 threads per block.
15. Sequential CPU vs. Graphics Processing Unit
(GPU)
CPU
GPU
• Optimized for low-
latency access to cached
data sets.
• Control logic for out-of-
order and speculative
execution
• Optimized for data-
parallel, throughput
computation
• Architecture tolerant
of memory latency.
• More transistors
dedicated to
Computation
September 28, 2016 15
16. Testing - Matrices
• Test the multiplication of two matrices.
• Creates two matrices with random floating point
values.
• We tested with matrices of various dimensions…
September 28, 2016 16
17. Results:
DimensionTime CUDA(GPU) CPU
64x64 0.417465 ms 18.0876 ms
128x128 0.41691 ms 18.3007 ms
256x256 2.146367 ms 145.6302 ms
512x512 8.093004 ms 1494.7275 ms
768x768 25.97624 ms 4866.3246 ms
1024x1024 52.42811 ms 66097.1688 ms
2048x2048 407.648 ms No Results
4096x4096 3.1 seconds No Results
September 28, 2016 17
18. Features
• Random access to memory
• Thread can access any memory location
• Unlimited access to memory
• Thread can read/write as many locations as needed
• User-managed cache (per block)
• Threads can cooperatively load data into SMEM
• Any thread can then access any SMEM location
• Low learning curve
• Just a few extensions to C
• No knowledge of graphics is required
• No graphics API overhead
September 28, 2016 18
19. Brute Force Algorithm
• No preprocessing phase.
• Constant extra space needed.
• Always shifts the window by exactly 1 position to the
right.
• Comparisons can be done in any order.
• Searching phase in O(mn) time complexity.
• 2n expected text characters comparisons.
September 28, 2016 19
21. Boyer-Moore Algorithm
• Performs the comparisons from right to left.
• Preprocessing phase in O(m+) time and space
complexity.
• Searching phase in O(mn) time complexity.
• 3n text character comparisons in the worst case
when searching for a non periodic pattern.
• O(n / m) best performance.
September 28, 2016 21
26. Conclusion
• GPU computing has gathered tremendous interest
as a solution that is able to handle complex
computational problems and massive data sets in
real time.
• At the core of GPU computing, the C-based
Integrated Development Environment, CUDA, has
become the quintessential piece to the next
generations of application development.
September 28, 2016 26